• 7 Posts
  • 96 Comments
Joined 1Y ago
cake
Cake day: Jun 04, 2023

help-circle
rss

Iiinteresting. I’m on the larger AB350-Gaming 3 and it’s got REV: 1.0 printed on it. No problems with the 5950X so far. 🤐 Either sheer luck or there could have been updated units before they officially changed the rev marking.


On paper it should support it. I’m assuming it’s the ASRock AB350M. With a certain BIOS version of course. What’s wrong with it?


B350 isn’t a very fast chipset to begin with

For sure.

I’m willing to bet the CPU in such a motherboard isn’t exactly current-gen either.

Reasonable bet, but it’s a Ryzen 9 5950X with 64GB of RAM. I’m pretty proud of how far I’ve managed to stretch this board. 😆 At this point I’m waiting for blown caps, but the case temp is pretty low so it may end up trucking along for surprisingly long time.

Are you sure you’re even running at PCIe 3.0 speeds too?

So given the CPU, it should be PCIe 3.0, but that doesn’t remove any of the queues/scheduling suspicions for the chipset.

I’m now replicating data out of this pool and the read load looks perfectly balanced. Bandwidth’s fine too. I think I have no choice but to benchmark the disks individually outside of ZFS once I’m done with this operation in order to figure out whether any show problems. If not, they’ll go in the spares bin.


I put the low IOPS disk in a good USB 3 enclosure, hooked to an on-CPU USB controller. Now things are flipped:

                                        capacity     operations     bandwidth 
pool                                  alloc   free   read  write   read  write
------------------------------------  -----  -----  -----  -----  -----  -----
storage-volume-backup                 12.6T  3.74T      0    563      0   293M
  mirror-0                            12.6T  3.74T      0    563      0   293M
    wwn-0x5000c500e8736faf                -      -      0    406      0   146M
    wwn-0x5000c500e8737337                -      -      0    156      0   146M

You might be right about the link problem.

Looking at the B350 diagram, the whole chipset is hooked via PCIe 3.0 x4 link to the CPU. The other pool (the source) is hooked via USB controller on the chipset. The SATA controller is also on the chipset so it also shares the chipset-CPU link. I’m pretty sure I’m also using all the PCIe links the chipset provides for SSDs. So that’s 4GB/s total for the whole chipset. Now I’m probably not saturating the whole link, in this particular workload, but perhaps there’s might be another related bottleneck.


Turns out the on-CPU SATA controller isn’t available when the NVMe slot is used. 🫢 Swapped SATA ports, no diff. Put the low IOPS disk in a good USB 3 enclosure, hooked to an on-CPU USB controller. Now things are flipped:

                                        capacity     operations     bandwidth 
pool                                  alloc   free   read  write   read  write
------------------------------------  -----  -----  -----  -----  -----  -----
storage-volume-backup                 12.6T  3.74T      0    563      0   293M
  mirror-0                            12.6T  3.74T      0    563      0   293M
    wwn-0x5000c500e8736faf                -      -      0    406      0   146M
    wwn-0x5000c500e8737337                -      -      0    156      0   146M

Interesting. SMART looks pristine on both drives. Brand new drives - Exos X22. Doesn’t mean there isn’t an impending problem of course. I might try shuffling the links to see if that changes the behaviour on the suggestions of the other comment. Both are currently hooked to an AMD B350 chipset SATA controller. There are two ports that should be hooked to the on-CPU SATA controller. I imagine the two SATA controllers don’t share bandwidth. I’ll try putting one disk on the on-CPU controller.



I'm syncoiding from my normal RAIDz2 to a backup mirror made of 2 disks. I looked at `zpool iostat` and I noticed that one of the disks consistently shows less than half the write IOPS of the other: ``` capacity operations bandwidth pool alloc free read write read write ------------------------------------ ----- ----- ----- ----- ----- ----- storage-volume-backup 5.03T 11.3T 0 867 0 330M mirror-0 5.03T 11.3T 0 867 0 330M wwn-0x5000c500e8736faf - - 0 212 0 164M wwn-0x5000c500e8737337 - - 0 654 0 165M ``` This is also evident in `iostat`: ``` f/s f_await aqu-sz %util Device 0.00 0.00 3.48 46.2% sda 0.00 0.00 8.10 99.7% sdb ``` The difference is also evident in the temperatures of the disks. The busier disk is 4 degrees warmer than the other. The disks are identical on paper and bought at the same time. Is this behaviour expected?
fedilink

  • Lenovo ThinkCentre / Dell OptiPlex USFF machine like the M710q.
  • Secondary NVMe or SATA SSD for a RAID1 mirror
    • Use LVMRAID for this. It uses mdraid underneath but it’s easier to manage
  • External USB disks for storage
    • WD Elements generally work well when well ventilated
    • OWC Mercury Elite Pro Quad has a very well implemented USB path and has been problem-free in my testing
  • Debian / Ubuntu LTS
  • ZFS for the disk storage
  • Backups may require a second copy or similar of this setup so keep that in mind when thinking about the storage space and cost

Here’s a visual inspiration:


Yes, yes I would use ZFS if I had only one file on my disk.


OK, I think it may have to do with the odd number of data drives. If I create a raidz2 with 4 of the 5 disks, even with ashift=12, recordsize=128K, the performance in sequential single thread read is stellar. What’s not clear is why this doesn’t affect, or not as much, the 4x 8TB-drive raidz1.


Slow sequential single file reads on ZFS
I built a 5x 16TB RAIDz2, filled it with data, then I discovered the following. Sequentially reading a single file from the file system gave me around 40MB/s. Reading multiple in parallel brought the total throughput in the hundreds of megabytes - where I'd expect it. This is really weird. The 5 disks show 100% utilization during single file reads. Writes are supremely fast, whether single threaded or parallel. Reading directly from each disk gives >200MB/s. Splitting the the RAIDz2 into two RAIDz1s, or into one RAIDz1 and a mirror improved reads to 100 and something MB/s. Better but still not where it should be. I have an existing RAIDz1 made of 4x 8TB disks on the same machine. That one reads with 250-350MB/s. I made an equivalent 4x 16TB RAIDz1 from the new drives and that read with about 100MB/s. Much slower. All of this was done with `ashift=12` and default `recordsize`. The disks' datasheets say their block size is 4096. I decided to try RAIDz2 with `ashift=13` even though the disks really say they've got 4K physical block size. Lo and behold, the single file reads went to over 150MB/s. 🤔 Following from there, I got full throughput when I increased the `recordsize` to 1M. This produces full throughput even with `ashift=12`. My existing 4x 8TB RAIDz1 pools with `ashift=12` and `recordsize=128K` read single files *fast.* Here's a diff of the queue dump of the old and new drives. The left side is a WD 8TB from the existing RAIDz1, the right side is one of the new HC550 16TB ``` < max_hw_sectors_kb: 1024 --- > max_hw_sectors_kb: 512 20c20 < max_sectors_kb: 1024 --- > max_sectors_kb: 512 25c25 < nr_requests: 2 --- > nr_requests: 60 36c36 < write_cache: write through --- > write_cache: write back 38c38 < write_zeroes_max_bytes: 0 --- > write_zeroes_max_bytes: 33550336 ``` Could the `max_*_sectors_kb` being half on the new drives have something to do with it? --- Can anyone make any sense of any of this?
fedilink


Here’s the box test thread if you’re curious. 😊


I think I’ve seen this hypothesis too and it makes sense to me.

If I’m building a new AMD system today, I’d look for a board that exposes more of the chipset-provided USB ports. Otherwise I’d budget for a high quality 4-port PCIe USB controller, if I’m planning to rely a lot on USB on that system.


This article provides some context. Now I do have the latest firmware which should have these fixes but they don’t seem to be foolproof. I’ve seen reports around the web that the firmware improves things but doesn’t completely eliminate them.

If you’ve seen devices disconnecting and reconnecting on occasion, it could be it.


I’ve been on the USB train since 2019.

You’re exactly right, you gotta get devices with good USB-to-SATA chipsets, and you gotta keep them cool.

I’ve been using a mix of WD Elements, WD MyBook and StarTech/Vantec enclosures (ASM1351). I’ve had to cool all the chipsets on WD because they like bolt the PCBs straight to the drive so it heats up from it.

From all my testing I’ve discovered that:

  • ASM1351 and ASM235CM are generally problem-free, but the former needs passive cooling if close to a disk. A small heatsink adhered with standard double-sided heat conductive tape is good enough.
  • Host controllers matter too. Intel is generally problem-free. So is VIA. AMD has some issues on the CPU side on some models which are still not fully solved.

I like this box in particular because it uses a very straightforward design. It’s got 4x ASM235CM with cooling connected to a VIA hub. It’s got a built-in power supply, fan, it even comes with good cables. It fixes a lot of the system variables to known good values. You’re left with connecting it to a good USB host controller.

WD PCB on disk


I thought about it, but it typically requires extra PCIe cards that I can’t rely on as there’s no space in one of the machines and no PCIe slots in the other. That’s why I did a careful search till I stumbled upon this particular enclosure and then I tested one with ZFS for over a week before buying the rest.


You want ASMedia ASM1351 (heatsinked) or ASM235CM on the device side 🥹

This box has 4x ASM235CM and from the testing I’ve conducted over the last week it seems rock solid, so long as it’s not connected to the Ryzen’s built-in USB controller. It’s been flawless on the B350 chipset’s USB controller.


Thanks for the warning ⚠️🙏

This isn’t my first rodeo with ZFS on USB. I’ve been running USB for a few years now. Recently I ran this particular box through a battery of tests and I’m reasonably confident that with my particular set of hardware it’ll be fine. It passed everything I threw at it, once connected to a good port on my machine. But you’re generally right and as you can see I discussed that in the testing thread, and I encountered some issues that I managed to solve. If you think I’ve missed something specific - let me know! 😊


That was the cheapest option. 🤭


Two machines. A main server/workstation and a small off-site backup machine that runs the same services but hass less compute and RAM.


  • 8x 8TB in a set of 2, some shucked WDs, some IronWolfs
  • 5x 16TB in a set of 2, “recertified” WDs from serverpartdeals.com



Quite possibly. That said the one I linked is CUI, not a noname. It’s even got an MTBF of 300K hours in its datasheet. There are cheaper ones. 😅 And more expensive ones.


I see a number of dual-output PSUs on Mouser that will probably fit well if this goes. For example.



Done. Says 150W on it. Not sure if it’s real. If it is, then it’s plenty overrated for the hardware which should bode well for its longevity. Especially given that the caps are Chengx across the board so definitely not the best. :D Can you tell anything interesting about it from the pics?


Finally happy with the testing. I’ll disassemble it sometime today.


440 pounds is insane, agreed. 😂

Yeah I get it then. So it depends on whether one has PCIe slots available, 3.5" bays in the case, whether they can change the case if full, etc. It could totally make sense to do under certain conditions. In my case there’s no space in my PC case and I don’t have any PCIe slots left. In addition, I have an off-site machine that’s an USFF PC which has no PCIe slots or SATA ports. It’s only available connectivity is USB. So in my case USB is what I can work with. As long as it isn’t exorbitantly expensive, a USB solution has flexibility in this regard. I would have never paid 440 pounds for this if that was the price. I’d have stayed with single enclosures nailed to a wooden board and added a USB hub. 🥹 Which is how they used to be:


Please elaborate.

What I parse that you’re talking about is a PCIe SATA host controller and a box for the disks. Prior to landing on the OWC, I looked at QNAP’s 4-bay solution that does this - the TL-D400S. That can be found around the $300 mark. The OWC is $220 from the source. That’s roughly equivalent to 4 of StarTech’s enclosures that use the same chipset.


I wasn’t able to reach it for a top-down visual through the back and so I don’t have pics of it other than the side view already attached. I’ll try disassembling it further sometime today, once the ZFS scrub completes.

Luckily, the PSU is connected to the main board via standard molex. If the built-in one blows up, you could replace it with any ATX PSU, large, small (FlexATX, etc), or one of those power bricks that spit out a 5V/12V molex. Whether you can stuff it in or not.


I was wondering what would be better for discoverability, to write this in a blog post, on GitHub, then link it here, or to just write it here. Turns out Google’s crawling Lemmy quite actively. This shows up within the first 10-15 results for “USB DAS ZFS”:

It appears that Lemmy is already a good place for writing stuff like this. ☺️


5950X, 64GB

It’s a multipurpose machine, desktop workstation, games, running various servers.


## Why I'm running a ZFS pool of 4 external USB drives. It's a mix of WD Elements and enclosed IronWolfs. I'm looking to consolidate it into a single box since I'm likely to add another 4 drives to it in the near future and dealing with 8 external drives could become a bit problematic in a few ways. ## ZFS with USB drives There's been recurrent questions about ZFS with USB. Does it work? How does it work? Is it recommended and so on. The answer is complicated but it revolves around - yes it works and it can work well **so long as you ensure that anything on your USB path is good.** And that's difficult since it's not generally known what USB-SATA bridge chipset an external USB drive has, whether it's got firmware bugs, whether it requires quirks, is it stable under sustained load etc. Then that difficulty is multiplied by the number of drives the system has. In my setup for example, I've swapped multiple enclosure models till I stumbled on a rock-solid one. I've also had to install heatsinks on the ASM1351 USB-SATA bridge ICs in the WD Elements drives to stop them from overheating and dropping dead under heavy load. With this in mind, if a multi-bay unit like the OWC Mercury Elite Pro Quad proves to be as reliable as some anecdotes say, it could become a go-to recommendation for USB DAS that eliminates a lot of those variables, leaving just the host side since it comes with a cable too. And the host side tends to be reliable since it's typically either Intel or AMD. Read ##Testing for some tidbits about AMD. ## Initial observations of the OWC Mercury Elite Pro Quad - Built like a tank, heavy enclosure, feet screwed-in not glued - Well designed for airflow. Air enters the front, goes through the disks, PSU, main PCB and exits from the back. Some IronWolf that averaged 55°C in individual enclosures clock at 43°C in here - It's got a Good Quality DC Fan (check pics). So far it's pretty quiet - Uses 4x ASM235CM USB-SATA bridge ICs which are found in other well-regarded USB enclosures. It's newer than the ASM1351 which is also reliable when not overheating - The USB-SATA bridges are wired to a USB 3.1 Gen 2 hub - VLI-822. No SATA port multipliers - The USB hub is heatsinked - The ASM235CM ICs have a weird thick thermal pad attached to them but without any metal attached to it. It appears they're serving as heatsinks themselves which might be enough for the ICs to stay within working temps - The main PCB is all-solid-cap affair - The PSU shows electrolytic caps which is unsurprising - The main PCB is connected to the PSU via standard molex connectors like the ones found in ATX PSUs. Therefore if the built-in PSU dies, it could be replaced with an ATX PSU - It appears to rename the drives to its own "Elite Pro Quad A/B/C/D" naming, however `hdparm -I /dev/sda` seems to return the original drive information. The disks appear with their internal designations in GNOME Disks. The kernel maps them in `/dev/disks/by-id/*` according to those as before. I moved my drives in it, rebooted and ZFS started the pool as if nothing happened - SMART info is visible in GNOME Disks as well as `smartctl -x /dev/sda` - It comes with both USB-C to USB-C cable and USB-C to USB A - Made in Taiwan ## Testing - No errors in the system logs so far - I'm able to pull 350-370MB/s sequential from my 4-disk RAIDz1 - Loading the 4 disks together with `hdparm` results in about 400MB/s total bandwidth - It's hooked up via USB 3.1 Gen 1 on a B350 motherboard. I don't see a significant difference in the observed speeds whether it's on the chipset-provided USB host, or the CPU-provided one - Completed a manual scrub of a 24TB RAIDz1 while also being loaded with an Immich backup, Plex usage, Syncthing rescans and some other services. No errors in the system log. Drives stayed under 44°C. Stability looks promising - Will pull a drive and add a new one to resilver once the latest changes get to the off-site backup - Pulled a drive from the pool and replaced it with a spare while the pool was live. SATA hot plugging seems to work. Resilvered 5.25TB in about 32 hours while the pool was in use. Found the following vomit in the logs repeating every few minutes: ``` Apr 01 00:31:08 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 01 00:31:08 host kernel: usb 6-3.4: reset SuperSpeed USB device number 12 using xhci_hcd Apr 01 00:31:08 host kernel: scsi host11: uas_eh_device_reset_handler success Apr 01 00:32:42 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 01 00:32:42 host kernel: usb 6-3.4: reset SuperSpeed USB device number 12 using xhci_hcd Apr 01 00:32:42 host kernel: scsi host11: uas_eh_device_reset_handler success Apr 01 00:33:54 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 01 00:33:54 host kernel: usb 6-3.4: reset SuperSpeed USB device number 12 using xhci_hcd Apr 01 00:33:54 host kernel: scsi host11: uas_eh_device_reset_handler success Apr 01 00:35:07 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 01 00:35:07 host kernel: usb 6-3.4: reset SuperSpeed USB device number 12 using xhci_hcd Apr 01 00:35:07 host kernel: scsi host11: uas_eh_device_reset_handler success Apr 01 00:36:38 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 01 00:36:38 host kernel: usb 6-3.4: reset SuperSpeed USB device number 12 using xhci_hcd Apr 01 00:36:38 host kernel: scsi host11: uas_eh_device_reset_handler success ``` It appears to be only related to the drive being resilvered. I did not observe resilver errors - Resilvering `iostat` shows numbers in-line with the 500MB/s of the the USB 3.1 Gen 1 port it's connected to: ``` tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device 314.60 119.9M 95.2k 0.0k 599.4M 476.0k 0.0k sda 264.00 119.2M 92.0k 0.0k 595.9M 460.0k 0.0k sdb 411.00 119.9M 96.0k 0.0k 599.7M 480.0k 0.0k sdc 459.40 0.0k 120.0M 0.0k 0.0k 600.0M 0.0k sdd ``` - Running a second resilver on a chipset-provided USB 3.1 port while looking for USB resets like previously seen in the logs. The hypothesis is that here's instability with the CPU-provided USB 3.1 ports as there have been documented problems with those * I had the new drive disconnect upon KVM switch, where the KVM is connected to the same same chipset-provided USB controller. Moved the KVM to the CPU-provided controller. This is getting fun * Got the same resets as the drive began the sequential write phase: ``` Apr 02 16:13:47 host kernel: scsi host11: uas_eh_device_reset_handler start Apr 02 16:13:47 host kernel: usb 6-2.4: reset SuperSpeed USB device number 9 using xhci_hcd Apr 02 16:13:47 host kernel: scsi host11: uas_eh_device_reset_handler success ``` * 🤦 It appears that I read the manual wrong. All the 3.1 Gen 1 ports on the back IO are CPU-provided. Moving to a chipset-provided port for real and retesting... The resilver entered its sequential write phase and there's been no resets so far. The peak speeds are a tad higher too: ``` tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device 281.80 130.7M 63.2k 0.0k 653.6M 316.0k 0.0k sda 273.00 130.1M 56.8k 0.0k 650.7M 284.0k 0.0k sdb 353.60 130.8M 63.2k 0.0k 654.0M 316.0k 0.0k sdc 546.00 0.0k 133.2M 0.0k 0.0k 665.8M 0.0k sdd ``` * Resilver finished. No resets or errors in the system logs * Did a second resilver. Finished without errors again * Resilver while connected to the chipset-provided USB port takes around 18 hours for the same disk that took over 30 hours via the CPU-provided port ## Verdict so far ~~The OWC passed all of the testing so far with flying colors.~~ Even though resilver finished successfully, there were silent USB resets in the logs with the OWC connected to CPU-provided ports. Multiple ports exhibited the same behavior. When connected to a B350 chipset-provided port on the other hand the OWC finished two resilvers with no resets and faster, 18 hours vs 32 hours. My hypothesis is that these silent resers are likely related to the known USB problems with Ryzen CPUs. The OWC itself passed testing with flying colors when connected to a chipset port. I'm buying another one for the new disks. ## Pics ### General ![](https://lemmy.ca/pictrs/image/db5dbebe-13e9-40a5-835f-9b39cd0911af.jpeg) ![](https://lemmy.ca/pictrs/image/f69ab275-1be5-4bc2-b0bd-9cd3ae092a6d.jpeg) ![](https://lemmy.ca/pictrs/image/cad3e2fc-09ce-4c3f-8570-9140152c0c60.jpeg) ![](https://lemmy.ca/pictrs/image/4279b793-83d3-456d-a430-d54475390239.jpeg) ![](https://lemmy.ca/pictrs/image/25a2d1bc-6dd8-4682-92ed-846f8e718906.jpeg) ![](https://lemmy.ca/pictrs/image/d044f0eb-33b5-4299-8232-b960e98a5438.jpeg) ![](https://lemmy.ca/pictrs/image/64d17833-5a95-4848-858f-963b410dc766.jpeg) ### PSU ![](https://lemmy.ca/pictrs/image/ea771b59-27b4-43bc-a572-6f54f8e3737c.jpeg) ![](https://lemmy.ca/pictrs/image/c651469b-1851-422d-b0be-013b23e7bd2d.jpeg) ![](https://lemmy.ca/pictrs/image/48c6dbdf-8571-4465-8e6c-f7a4d77cfb81.jpeg) ![](https://lemmy.ca/pictrs/image/8dc16b83-317e-459a-b59c-a33f17006aa1.jpeg) ![](https://lemmy.ca/pictrs/image/190d1391-b372-4231-bf64-6e404c2c2a08.jpeg) ![](https://lemmy.ca/pictrs/image/575564f0-b225-448d-ba30-eb324d5e1273.jpeg) ![](https://lemmy.ca/pictrs/image/f6cdb3cd-23c1-4297-88eb-28e759efd484.jpeg) ![](https://lemmy.ca/pictrs/image/00382605-e11c-43a2-8dd9-02334000387a.jpeg) ![](https://lemmy.ca/pictrs/image/9a167fa5-eda2-4b8a-8777-471f0a5daa05.jpeg) ![](https://lemmy.ca/pictrs/image/e7149bc2-684e-44d6-8384-80e8b0aaca41.jpeg) ![](https://lemmy.ca/pictrs/image/a1a8d38e-a451-4376-8407-ecc0800b2b8c.jpeg) ![](https://lemmy.ca/pictrs/image/797e745b-1683-47b2-bf37-7db944119db8.jpeg) ![](https://lemmy.ca/pictrs/image/08af9fc9-6482-4664-84f7-a20c9df6c9e4.jpeg) ![](https://lemmy.ca/pictrs/image/cb17adcd-b18b-4fc3-bba7-10476dd20d42.jpeg) ![](https://lemmy.ca/pictrs/image/5257a9b3-9c1a-4c10-b9f3-18b0b0184892.jpeg)
fedilink


Just got reminded of this classic!
fedilink

- Hey ChatGPT, is it normal for my A4 to be burning this much oil? …

- Yes.


How many cans-of-beans.jpg can you store?


Feels like this will benefit from some sort of fuzzy deduplication in the pictrs storage. I bet there are a lot of similar pics in there. E.g. if one pic or a gif is very similar to another, say just different quality or size, or compression, it should keep only one copy. It might already do this for the same files uploaded by different people as those can be compared trivially via hashing, but I doubt it does similarity based deduplication.




Exactly. There are obvious problems with this conundrum and the government’s move is not ideal but then the situation we’re in is also not ideal. The implications of leaving it unmitigated are eating into our democracy and without a functioning democracy, there’s no functioning world wide web. And so as a firm supporter of the WWW, I find myself having to stick for our government and our media oligopoly (🤢) on this one even if it’s not ideal from the WWW lens. It feels a bit like chemotherapy. We have to do it even if we harm some systems because otherwise many more systems will go. 🤷


Canada’s Oil-Sands Miners Want to Flush Oceans of Wastewater Downstream
> Waste sitting in pits could fill almost 883,000 Olympic-size swimming pools, and oil companies say they need to find a way to reduce it > The companies, including an affiliate of Exxon Mobil, are lobbying the Canadian government to set rules that would allow them to treat the waste and release it into the Athabasca River by 2025, so they have enough time to meet their commitments to eventually close the mines. Of course they are.
fedilink