Page 1 of 1

Bad fragmentation on NTFS restores

Posted: Fri May 15, 2009 8:22 pm
by dowdle
I'm sure a lot has to do with the size of the filesystem and the number of files and directories... but I just tried fsarchiver today in a computer lab I manage... and have discovered a significant problem with fragmentation.

A little history and background - I've been using partimage for the past couple of years and since all of my lab machines are the same hardware and partitioned all the same I haven't had any desire to restore an image onto a partition that was smaller than the original... but having the ability to do so in fsarchiver sounds great. The lab machines are dual boot (Windows XP Pro SP3 and CentOS 5.3) and the size of the image is quite large as there is a ton of software installed. After getting one Windows side just the way I want it I defrag a couple of times and then run sysprep.

So today I tried fsarchiver. I booted the latest SystemRescueCD and used an external USB hard disk to store the .fsa file on. I backed up only the NTFS partition.

Restoring it on other machines I notice how much longer the sysprep process takes after rebooting... and that it takes a noticeably longer time to login. I thought the problem might be fragmentation so I looked at it... and sure enough the NTFS partition is very fragmented.... and as a result degrading performance. So far the only way around this that I've tried is to just do a defrag after the imaging process.

Is there any way to avoid the bad fragmentation? I don't have this problem with partimage.

Posted: Fri May 15, 2009 9:36 pm
by admin
You are absolutely right, the fragmentation is different with fsarchiver because it's file based and not block based. fsarchiver has no control over the way the files are created, and there is nothing I can do to avoid that fragmentation unfortunately. I guess optimizations in ntfs-3g have to be done, they will probably work on that later, but I cannot tell you more about it.

Partimage is block based so you just get the exact copy of your original partition. But I think fsarchiver is better because you can restore to a smaller partition (or bigger with no need to ntfsresize) and you can have a better compression if you use lzma, and it will be faster because it's multi-threaded.

Posted: Sun May 17, 2009 4:41 pm
by tuipveus
Will fragmentation occur also in ext3 filesystems?

Posted: Sun May 17, 2009 7:18 pm
by admin
There are fragmentation with ntfs because it's not the native linux filesystem and it has probably not been optimized for that. ext3 and other linux filesystems are probably optimized to avoid that problem in recent kernels, especially when we don't write multiple files at the same time.

Posted: Sun May 17, 2009 7:31 pm
by tuipveus
fdupoux wrote:There are fragmentation with ntfs because it's not the native linux filesystem and it has probably not been optimized for that. ext3 and other linux filesystems are probably optimized to avoid that problem in recent kernels, especially when we don't write multiple files at the same time.
Ok. If I use -j option and multiple cores when archiving/restoring image, will that cause more fragmentation compared to single core (-j 1) ?

Maybe it would be useful to use partimage for ntfs, and fsarchiver for ext3 as a standard way.

When I save disk to fsarchive-file (in ntfs-filesystem), I noticed that most of the time cores are idle. Any idea if this because I am writing file to ntfs (mounted with ntfs-3g), or because I have specified only -j 4 with four-core system?

Some sort of buffering might speed things up, just like some users have reported to be with partimage.

Posted: Sun May 17, 2009 7:42 pm
by admin
The compression jobs number will have no effect on the fragmentation since only the main thread can read/write files to the disk:
http://www.fsarchiver.org/Multithreading

It's already buffered (up to 32 blocks are in the queue in memory) so that we have enough data to feed the cores.

In your case, I guess the cores are idle because you must use a fast compression algorithm and then the bottleneck is the hard drive, not the cpu. You can make the disk faster by using an higher compression level:

Let's take an example:
1) With -z3 you read 1GB of data, you compress it but it needs less than one cpu, and you write 600MB of compressed data. If the bottleneck is the disk, you have 1600MB of IO on your disk
2) With -z7 you read 1GB of data, you compress it using your 4 cores which are all busy, you have to write 400MB of compressed data. If the disk and cpu are balanced (nothing is really waiting) you just have 1400MB of IO and it's faster.

Posted: Sun May 17, 2009 8:41 pm
by tuipveus
I was using z7 with HP DL380 (xeon 2.83GHz I think). Saved file was on ntfs, so it might be bottleneck, as well as usb-device. After I have tested a bit, I will be able to make better choice. One would be to change usb-device to use ext3, instead ntfs.

I wasn't watching closely, but I noticed that sometimes only one core at time was 99% and rest of the cores idle. I have to see that again.

Re:

Posted: Mon Apr 11, 2011 1:13 pm
by UlfZibis
admin wrote:You are absolutely right, the fragmentation is different with fsarchiver because it's file based and not block based. fsarchiver has no control over the way the files are created, and there is nothing I can do to avoid that fragmentation unfortunately. I guess optimizations in ntfs-3g have to be done, they will probably work on that later, but I cannot tell you more about it.
Maybe it would help, if you would first format the ntfs filesystem?

Re: Bad fragmentation on NTFS restores

Posted: Thu Apr 14, 2011 3:12 pm
by freddy77
I don't know how fsarchive works but usually Windows use to reserve space using some sort of fallocate command (well.. SetEndOfFile on Windows). This allow driver to reserve space more continuously. Perhaps ntfs-3g driver do the same and calling fallocate/posix_fallocate on Linux get the same effect.