Page 1 of 1

deduplication

Posted: Sun May 23, 2010 6:24 pm
by tuipveus
Deduplication is the word of today, or perhaps tomorrow.

Since everything is checksummed to the fsarchiver image-file, why to store duplicate blocks many times?

deduplication is implemented in different areas, for example in zfs-filesystem,
http://zfs-fuse.net/releases/0.6.9

and Linux kernel. Redhat engineers were able to run over 50 virtual computers in one normal computer, with the help of memory deduplication.

Well, well... I know that fsarchiver works at filelevel, not blocklevel, so it might be harder to implement.

Re: deduplication

Posted: Tue May 25, 2010 11:38 am
by admin
Unfortunately I don't think it will be possible to do that. If the data file was a filesystem, just like squashfs then we could imagine such a feature. But I think a sequential archive file is not appropriate for that sort of features.

Re: deduplication

Posted: Tue Oct 19, 2010 1:59 pm
by freddy77
perhaps it would be possible to detect duplicate files and say: this file is a duplication of this other file. It would require a new object for image file and perhaps to read the file 2 time (one for hashing and one to store into image file). Something like hardlinks with an additional flags to say copy it instead of linking it. You can avoid double read checking size (if size has never encountered skip it) or limiting (if file size is above some trigger do not hash it).