A year ago I set up Ubuntu server with 3 ZFS pools on my server, normally I don’t make copies of very large files but today I was making a copy of a ~30GB directory and I saw in rsync that the transfer doesn’t exceed 3mb/s (cp is also very slow).

What is the best file system that “just works”? I’m thinking of migrating everything to ext4

EDIT: I really like the automatic pool recovery feature in ZFS, has saved me from 1 hard drive failure so far

Kata1yst
link
fedilink
79M

Yeah, you should be scrubbing weekly or monthly, depending on how often you are using the data. Scrub basically touches each file and checks the checksums and fixes any errors it finds proactively. Basically preventative maintenance.
https://manpages.ubuntu.com/manpages/jammy/man8/zpool-scrub.8.html

Set that up in a cron job and check zpool status periodically.

No dedup is good. LZ4 compression is good. RAM to disk ratio is generous.

Check your disk’s sector size and vdev ashift. On modern multi-TB HDDs you generally have a block size of 4k and want ashift=12. This being set improperly can lead to massive write amplification which will hurt throughput.
https://www.high-availability.com/docs/ZFS-Tuning-Guide/

How about snapshots? Do you have a bunch of old ones? I highly recommend setting up a snapshot manager to prune snapshots to just a working set (monthly keep 1-2, weekly keep 4, daily keep 6 etc) https://github.com/jimsalterjrs/sanoid

And to parrot another insightful comment, I also recommend checking the disk health with SMART tests. In ZFS as a drive begins to fail the pool will get much slower as it constantly repairs the errors.

@Trincapinones@lemmy.world
creator
link
fedilink
English
19M

Wow that’s a lot of info, thank you!

@BobsAccountant@lemmy.world
link
fedilink
English
2
edit-2
9M

Adding on to this:

These are all great points, but I wanted to share something that I wish I’d known before I spun up my array… The configuration of your array matters a lot. I had originally chosen to use RAIDZ1 as it’s the most efficient with capacity while still offering a little fault tolerance. This was a mistake, but in my defense, the hard data on this really wasn’t distributed until long after I had moved my large (for me) dataset to the array. I really wish I had gone with a Striped Mirror configuration. The benefits are pretty overwhelming:

  • Performance is better than even RAIDZ2, especially as individual disk size increases.
  • Fault tolerance is better as you could have up to 50% of the disks fail, so long as one disk in a mirrored set remains functional.
  • Fault recovery is better. With traditional arrays with distributed chunks, you have to resilver (rebuild) the entire array, requiring more time, costing performance and shortening the life of the unaffected drives.
  • You can stripe mismatched sets of mirrored drives, so long as the mirrored set is identical, without having the array default to the size of the smallest member. This allows you to grow your array more organically, rather than having to replace every drive, one at a time, resilvering after each change.

Yes, you pay for these gains with less usable space, but platter drives are getting cheaper and cheaper, the trade seems more worth it than ever. Oh and I realize that it wasn’t obvious, but I am still using ZFS to manage the array, just not in a RAIDZn configuration.

@Trincapinones@lemmy.world
creator
link
fedilink
English
19M

deleted by creator

@Trincapinones@lemmy.world
creator
link
fedilink
English
1
edit-2
9M

Thanks for all the help!

I don’t have any redundancy, my system has an SSD (the one being slow) and 2 500Gb HDDs, in the hdds I only have movies and shows so I don’t care is that goes bad.

I have a lot of important personal stuff in the SSD but is new (6 months old) from crucial and I trust that because I don’t have the money to spare on another drive (+ electricity bills) and I trust that I’ll only lose 1-2 files if it goes bad because of the ZFS protection

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 215 users / day
  • 438 users / week
  • 1.15K users / month
  • 3.85K users / 6 months
  • 1 subscriber
  • 3.71K Posts
  • 74.7K Comments
  • Modlog