Question: I’ve heard that NTFS compression can reduce performance due to extra CPU usage, but I’ve read reports that it may actually increase performance because of reduced disk reads. ?How exactly does NTFS compression affect system performance?

Notes:

  • I’m running a laptop with a 5400 RPM hard drive, and many of the things I do on it are I/O bound.
  • The processor is a AMD Phenom II with four cores running at 2.0 GHz.
  • The system is defragmented regularly using UltraDefrag.
  • The workload is mixed read-write, with reads occurring somewhat more often than writes.
  • The files to be compressed include a selected subset of personal documents (not the full home folder) and programs, including several (less demanding) games and Visual Studio (which tends to be I/O bound more often than not).

Answer: I’ve heard that NTFS compression can reduce performance due to extra ?CPU usage, but I’ve read reports that it may actually increase ?performance because of reduced disk reads.

Correct. ?Assuming your CPU, using some compression algorithm, can compress at C MB/s and decompress at D MB/s, and your hard drive has write speed W and read speed R. So long as C > W, you get a performance gain when writing, and so long as D > R, you get a performance gain when reading. ?This is a drastic assumption in the write case, since Lempel-Ziv’s algorithm (as implemented in software) has a non-deterministic compression rate (although it can be constrained with a limited dictionary size).

?

How exactly does NTFS compression affect system performance?

Well, it’s exactly by relying on the above inequalities. ?So long as your CPU can sustain a compression/decompression rate above your HDD write speed, you should experience a speed gain. ?However, this does have an effect on large files, which may experience heavy fragmentation (due to the algorithm), or not be compressed at all.

This may be due to the fact that the Lempel-Ziv algorithm slows down as the compression moves on (since the dictionary continues to grow, requiring more comparisons as bits come in). ?Decompression is almost always the same rate, regardless of the file size, in the Lempel-Ziv algorithm (since the dictionary can just be addressed using a base + offset scheme).

Compression also impacts how files are laid out on the disk. ?By default, a single “compression unit” is 16 times the size of a cluster (so most 4 kB cluster NTFS filesystems will require 64 kB chunks to store files), but does not increase past 64 kB. ?However, this can affect fragmentation and space requirements on-disk.

As final note, latency is another interesting value of discussion. ?While the actual time it takes to compress the data does introduce latency, when the CPU clock speed is in gigahertz (i.e. each clock cycle is less then 1 ns), the latency introduced is negligible compared to hard drive seek rates (which is on the order of milliseconds, or millions of clock cycles).


To actually see if you’ll experience a speed gain, there’s a few things you can try. ?The first is to benchmark your system with a Lempel-Ziv based compression/decompression algorithm. ?If you get good results (i.e. C > W and D > R), then you should try enabling compression on your disk.

From there, you might want to do more benchmarks on actual hard drive performance. A truly important benchmark (in your case) would be to see how fast your games load, and see how fast your Visual Studio projects compile.

TL,DR: ?Compression might be viable for a filesystem utilizing many small files requiring high throughput and low latency. ?Large files are (and should be) unaffected due to performance and latency concerns.