deduplication

Post here if you want to request a features that has not been implemented yet
Post Reply
tuipveus
Posts: 44
Joined: Thu May 14, 2009 7:02 pm

deduplication

Post by tuipveus » Sun May 23, 2010 6:24 pm

Deduplication is the word of today, or perhaps tomorrow.

Since everything is checksummed to the fsarchiver image-file, why to store duplicate blocks many times?

deduplication is implemented in different areas, for example in zfs-filesystem,
http://zfs-fuse.net/releases/0.6.9

and Linux kernel. Redhat engineers were able to run over 50 virtual computers in one normal computer, with the help of memory deduplication.

Well, well... I know that fsarchiver works at filelevel, not blocklevel, so it might be harder to implement.

admin
Site Admin
Posts: 550
Joined: Sat Feb 21, 2004 12:12 pm

Re: deduplication

Post by admin » Tue May 25, 2010 11:38 am

Unfortunately I don't think it will be possible to do that. If the data file was a filesystem, just like squashfs then we could imagine such a feature. But I think a sequential archive file is not appropriate for that sort of features.

freddy77
Posts: 10
Joined: Tue Oct 19, 2010 1:20 pm

Re: deduplication

Post by freddy77 » Tue Oct 19, 2010 1:59 pm

perhaps it would be possible to detect duplicate files and say: this file is a duplication of this other file. It would require a new object for image file and perhaps to read the file 2 time (one for hashing and one to store into image file). Something like hardlinks with an additional flags to say copy it instead of linking it. You can avoid double read checking size (if size has never encountered skip it) or limiting (if file size is above some trigger do not hash it).

Post Reply