Long live EXT4
After further evaluation, I'm pulling back on XFS in favor of EXT4. XFS is the default filesystem for RHEL7 and going forward, I see its robustness as the main attraction. It is a well-rounded filesystem that scales well with large (> 100 TB) filesystems. Nonetheless, XFS has been seen as fragile despite its strides made over the last couple years. I had a reminiscent scare during power-cycle stress testing that left the filesystem corrupt requiring a newer xfsprogs than shipped with RHEL7.2 yet was slipped into the mainline 4.4 Linux kernel.
The filesystem was resuscitated, but those tests encompass a small sliver of "what-ifs". I've seen just about everything in 14 years from dirty SCSI cables causing signaling errors to failed "indestructible" solid polymer capacitors on RAID controllers (and even before then victimized by the "Deathstar"). We live in the physical world and in the physical world we're governed by the simple fact that shit happens. Murphy's Law is alive and well. I must always put customer data integrity first. XFS is still fragile from my controlled testing. I'd rather not find myself in a catastrophic situation overlooked through my limited testing environment that results in serious data loss. It's not worth it. It's not worth offlining a server for several hours while a backup despools.
I gathered some interesting data on XFS and EXT4 for those curious. XFS shows improved latency over EXT4; on mechanical drives this still has some impact. Atlas IO subsystem is based on 15k SAS drives with a generous 1 GB RAID cache, which should provide a balance between endurance and performance for clients (Enterprise SLC is still limited). Both XFS (-m crc=1,finobt=1 -d su=64k,sw=4) and EXT4 (-I 256 -b 4096 -E stride=16,stripe-width=64) were benchmarked on top of OverlayFS that provides a synthetic filesystem for each client. OverlayFS itself has a minimal overhead irrespective its underlying filesystem technology.
At first blush XFS looks great:
bonnie++ -n128 -d.
What's not to love? Approximate 6% gain across the board for IO. Meta's not to love. It's much slower than EXT4. Meta controls all the great things like a file's extended attributes and where that file is located in the filesystem (inode).
And consequently this translates poorly to CPU overhead on just about every category (NOTE: aware of the missing read fields):
bonnie is purely synthetic. Let's translate that over to something more tangible via WordPress. An approximate 12 hour siege (get it? ha! ha!) was conducted for each filesystem. XFS was mostly during the day whereas some EXT4 testing carried into the night competing with cronjob tasks. Even with competition, EXT4 (orange, 744 req/sec) still outperformed XFS (blue, 716 req/sec) on average.
siege -b -q -c20 -r1000 -l/tmp/siege
And on that note, I am continuing with EXT4 until Btrfs can butter me up.