I built a 5x 16TB RAIDz2, filled it with data, then I discovered the following.

Sequentially reading a single file from the file system gave me around 40MB/s. Reading multiple in parallel brought the total throughput in the hundreds of megabytes - where I’d expect it. This is really weird. The 5 disks show 100% utilization during single file reads. Writes are supremely fast, whether single threaded or parallel. Reading directly from each disk gives >200MB/s.

Splitting the the RAIDz2 into two RAIDz1s, or into one RAIDz1 and a mirror improved reads to 100 and something MB/s. Better but still not where it should be.

I have an existing RAIDz1 made of 4x 8TB disks on the same machine. That one reads with 250-350MB/s. I made an equivalent 4x 16TB RAIDz1 from the new drives and that read with about 100MB/s. Much slower.

All of this was done with ashift=12 and default recordsize. The disks’ datasheets say their block size is 4096.

I decided to try RAIDz2 with ashift=13 even though the disks really say they’ve got 4K physical block size. Lo and behold, the single file reads went to over 150MB/s. 🤔

Following from there, I got full throughput when I increased the recordsize to 1M. This produces full throughput even with ashift=12. My existing 4x 8TB RAIDz1 pools with ashift=12 and recordsize=128K read single files fast.

Here’s a diff of the queue dump of the old and new drives. The left side is a WD 8TB from the existing RAIDz1, the right side is one of the new HC550 16TB

< max_hw_sectors_kb: 1024
---
> max_hw_sectors_kb: 512
20c20
< max_sectors_kb: 1024
---
> max_sectors_kb: 512
25c25
< nr_requests: 2
---
> nr_requests: 60
36c36
< write_cache: write through
---
> write_cache: write back
38c38
< write_zeroes_max_bytes: 0
---
> write_zeroes_max_bytes: 33550336

Could the max_*_sectors_kb being half on the new drives have something to do with it?


Can anyone make any sense of any of this?

lightrush
creator
link
fedilink
English
27M

OK, I think it may have to do with the odd number of data drives. If I create a raidz2 with 4 of the 5 disks, even with ashift=12, recordsize=128K, the performance in sequential single thread read is stellar. What’s not clear is why this doesn’t affect, or not as much, the 4x 8TB-drive raidz1.

Would you use zfs and raid-z when there is only 1 file on your disk?

Would you build 4 ticket counters when your concert hall has only 1 seat? Would you build a 4 lane highway when there is only 1 car in your country?

:-)

lightrush
creator
link
fedilink
English
27M

Yes, yes I would use ZFS if I had only one file on my disk.

Ok :-)

Then you probably shouldn’t optimize it for the use of many files (which is the default, of course).

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 370 users / day
  • 584 users / week
  • 1.25K users / month
  • 3.86K users / 6 months
  • 1 subscriber
  • 3.73K Posts
  • 75.4K Comments
  • Modlog