Page 1 of 1

Error correction codes to recover (not ignore) damaged files

Posted: Sun Nov 14, 2010 9:16 am
by mvaldez
Support for error-correction codes to recover damaged files. Currently, fsarchiver can detect and ignore damaged files from a damaged archive, which is great because a damaged archive don't mean a completely damaged backup. However, using Reed-Solomon or similar codes, the corrupted files could be recovered even if damaged. The cost is some CPU overhead during the creation of the redundant codes and some extra space needed for those redundant codes.

Around here we always use parchive to generate recover files for our backup files (we use tar, cpio, dar, partimage and fsarchiver), so in case they become damaged we still can reconstruct them. Currently we have to do two steps (backup first, then create par2 files). I think it would be nice if fsarchiver could do the job in a single step. I don't mean calling parchive to do it, but actually calculating the correcting codes and store them in the archive just like checksums are stored now. (Dar, actually calls parchive to offer data recovery, but then you end up with the same: dar files and par2 files, error-correcting data is not stored in the backup files.)

Myabe this may sound overkill, but for long-term storage, I would feel safer knowing the tool not only could detect damaged files but actually could recover them. And of course we can still do what we have been doing: call parchive after fsarchiver.


Re: Error correction codes to recover (not ignore) damaged f

Posted: Sun Nov 21, 2010 10:26 pm
by admin
It's a very interesting suggestion. How big are the par2 files compared to fsa files ? Do you know the block size that parchive considers ?

In the current file format, blocks are several hundreds Kilo-Bytes depending on the compression algorithm. It would be difficult to add Error-Correcting Code every 4K, because it means having two level of blocks: big blocks for compression and small blocks in the big blocks for Error-Correcting. But if we put all the ECC data for the entire compressed block at the end of the big block it would be easier to do.

Do you know a very common library for C programs (not C++) that implements that stuff ? A C library similar to libgcrypt for encryption and check-summing or zlib/xz for compression, that comes with all popular linux distributions (including RHEL if possible). Also the license has to be compatible with GPL (ideally LGPL): I have replaced openssl with libgcrypt in the past because of licensing issues.

Re: Error correction codes to recover (not ignore) damaged f

Posted: Wed Dec 08, 2010 10:45 pm
by admin
More generic page about all error correction algorithms:
We have to find one which comes with a good C library on linux

Re: Error correction codes to recover (not ignore) damaged f

Posted: Wed Dec 08, 2010 11:00 pm
by admin
Maybe that sort of library, but can't find a package for fedora.

vdmfec reads an input stream and adds error correction blocks so that
large consecutive sections of the output stream may be corrupted, and
the data recovered. For example, diskettes typically lose whole
sectors at once, or related groups of sectors, or even entire tracks.
Data written to a diskette with this program may be recovered even with
many read errors.

The algorithm used is a Forward Error Correction (FEC) code based on
Vandermonde (VDM) matrices in GF(2^8) due to Luigi Rizzo. Given the
FEC parameters K and N, with N greater than K, N blocks are written for
every K input blocks in such a way that any K blocks are sufficient to
reconstruct the data. That is, up to N - K blocks out of every group
of N blocks may be lost without loss of data.

The amount of overhead in the output stream is easily adjustable by
varying K. N and blocksize control the total amount of data written.
Depending on the types of errors you expect, different settings may be
more or less useful. For example, you may not expect to have two or
three bad sectors on every track (if you do it’s time to replace the
diskette!), but you might expect three bad sectors on two or three
contiguous tracks (diskette errors tend to cluster).

Re: Error correction codes to recover (not ignore) damaged f

Posted: Wed Dec 08, 2010 11:08 pm
by admin

Re: Error correction codes to recover (not ignore) damaged f

Posted: Wed Dec 08, 2010 11:27 pm
by admin
So it sounds that zfec and vdmfec are the two best options. Both are under the GPL license. vdmfec is quite old but still available in the latest debian. zfec is very recent but cannot find any package.

As these libraries are not available in all linux distribution and the real code seems to be quite small, the best option may be to put an embedded copy of the library in fsarchiver to avoid any dependency problem in linux distributions.

Re: Error correction codes to recover (not ignore) damaged f

Posted: Sat Dec 11, 2010 7:26 pm
by admin
I have tested vdmfec and it seems to be working. In fact vdmfec is just a wrapper around the following fec implementation:

Code: Select all

1) create a random file in /var/tmp/file01
2) vdmfec -b 256k /var/tmp/file01 >| /var/tmp/file02
3) use an hex editor to corrupt /var/tmp/file02
4) vdmfec -b 256k -d /var/tmp/file02 >| /var/tmp/file03
5) now /var/tmp/file01 and /var/tmp/file03 should have the same contents

Re: Error correction codes to recover (not ignore) damaged f

Posted: Tue Jan 11, 2011 11:17 am
by admin
The error correction code has been implemented in the master branch of the git repository. It will then be available in the final fsarchiver-0.7.0. More details about the new features