Question: There is considerable interest in shingled drives. These put data tracks so close together that you can’t write to one track without clobbering the next. This may increase capacity by 20% or so, but results in write amplification problems. There is work underway on filesystems optimised for Shingled drives, for example see: https://lwn.net/Articles/591782/
Some shingled disks such as the Seagate 8TB archive have a cache area for random writes, allowing decent performance on generic filesystems. The disk can even be quite fast on some common workloads, up to round 200MB/sec writes. However, it is to be expected that if the random write cache overflows, the performance may suffer. Presumably, some filesystems are better at avoiding random writes in general, or patterns of random writes likely to overflow the write cache found in such drives.
Is a mainstream filesystem in the linux kernel better at avoiding the performance penalty of shingled disks than ext4?
Answer: Intuitively Copy-on-Write and Log structured filesystems might give better performance on shingled disks by reducing reduce random writes. The benchmarks somewhat support this, however, these differences in performance are not specific to shingled disks. They also occur on an unshingled disk used as a control. Thus the switching to a shingled disk might not have much relevance to your choice of filesystem.
The nilfs2 filesystem gave quite good performance on SMR disk. However, this was because I allocated the whole 8TB partition, and the benchmark only wrote ~0.5TB so the nilfs cleaner did not have to run. When I limited the partition to 200GB the nilfs benchmarks did not even complete successfully. Nilfs2 may be a good choice performance-wise if you really use the “archive” disk as an archive disk where you keep all the data and snapshots written to the disk forever, as then then nilfs cleaner does not have to run.
I understand that the 8TB seagate ST8000AS0002-1NA17Z drive I used for the test has a ~20GB cache area. I made changed the default filebench fileserver settings so that the benchmarks set would be ~125GB, larger than the unshingled cache area:
set $meanfilesize=1310720set $nfiles=100000run 36000
Now for the actual data. The number of ops measures the “overall” fileserver performance while the ms/op measures the latency of the random append, and could be used as a rough guide to the performance of random writes. ?
$ grep rand *0.out | sed s/.0.out:/ / |sed ‘s/ – /-/g’ | ?column -tSMR8TB.nilfs ?appendfilerand1 ?292176ops 8ops/s ?0.1mb/s ?1575.7ms/op ?95884us/op-cpu [0ms – 7169ms]SMR.btrfs ?appendfilerand1 ?214418ops ?6ops/s ?0.0mb/s ?1780.7ms/op ?47361us/op-cpu ?[0ms-20242ms]SMR.ext4 ?appendfilerand1 ?172668ops ?5ops/s ?0.0mb/s ?1328.6ms/op ?25836us/op-cpu ?[0ms-31373ms]SMR.xfs ?appendfilerand1 ?149254ops ?4ops/s ?0.0mb/s ?669.9ms/op ?19367us/op-cpu ?[0ms-19994ms]Toshiba.btrfs ?appendfilerand1 ?634755ops ?18ops/s ?0.1mb/s ?652.5ms/op ?62758us/op-cpu ?[0ms-5219ms]Toshiba.ext4 ?appendfilerand1 ?466044ops ?13ops/s ?0.1mb/s ?270.6ms/op ?23689us/op-cpu ?[0ms-4239ms]Toshiba.xfs ?appendfilerand1 ?368670ops ?10ops/s ?0.1mb/s ?195.6ms/op ?19084us/op-cpu ?[0ms-2994ms]
Since the Seagate is 5980RPM one might naively expect the Toshiba to be 20% faster. These benchmarks show it as being roughly 3 times (200%) faster, so these benchmarks are hitting the shingled performance penalty. We see Shingled (SMR) disk still can’t match the performance ext4 with on a unshingled (PMR) disk. The best performance was with nilfs2 with a 8TB partition (so the cleaner didn’t need to run), but even then it was significantly slower than the Toshiba with ext4.
To make the benchmarks above more clear, it might might help to normalise them relative to the performance of ext4 on each disk:
?ops ?randappendSMR.btrfs: ?1.24 ?0.74SMR.ext4: ?1 ?1SMR.xfs: ?0.86 ?1.98Toshiba.btrfs: ?1.36 ?0.41Toshiba.ext4: ?1 ?1Toshiba.xfs: ?0.79 ?1.38
We see that on the SMR disk btrfs has most of the advantage on overall ops that it has on ext4, but penalty on random appends is not as dramatic as a ratio. This might lead one to move to btrfs on the SMR disk. On the other hand, if you need low latency random appends, this benchmark suggests you want xfs, especially on SMR. We see that while SMR/PMR might influence your choice of filesystem, considering the workload your are optimising for seems more important.
I also ran an attic based benchmark. The durations of the attic runs (on the 8TB SMR full disk partitions) were:
ext4: ?1 days 1 hours 19 minutes 54.69 secondsbtrfs: 1 days 40 minutes 8.93 secondsnilfs: 22 hours 12 minutes 26.89 seconds
In each case the attic repositories had the following stats:
??Original size ?Compressed size ?Deduplicated sizeThis archive: ?1.00 TB ?639.69 GB ?515.84 GBAll archives: ?901.92 GB ?639.69 GB ?515.84 GB
Adding a second copy of the same 1 TB disk to attic took 4.5 hours on each of these three filesystems. A raw dump of the benchmarks and smartctl information is at:http://pastebin.com/tYK2Uj76https://github.com/gmatht/joshell/tree/master/benchmarks/SMR