edit: you are right, it’s the I/O WAIT that it destroying my performance:
%Cpu(s): 0,3 us, 0,5 sy, 0,0 ni, 50,1 id, 49,0 wa, 0,0 hi, 0,1 si, 0,0 st
I could clearly see it using nmon > d > l > - such as was suggested by @SayCyberOnceMore. Not quite sure what to do about it, as it’s simply my sdb1 drive which is a Samsung 1TB 2.5" HDD. I have now ordered a 2TB SSD and maybe I am going to reinstall from scratch on that new drive as sda1. I realize that’s just treating the symptom and not the root cause, so I should probably also look for that root cause. But that’s for another Lemmy thread!

I really don’t understand what is causing this. I run a few very small containers, and everything is fine - but when I start something bigger like Photoprism, Immich, or even MariaDB or PostgreSQL, then something causes the CPU load to rise indefinitely.

Notably, the top command doesn’t show anything special, nothing eats RAM, nothing uses 100% CPU. And yet, the load is rising fast. If I leave it be, my ssh session loses connection. Hopping onto the host itself shows a load of over 50,or even over 70. I don’t grok how a system can even get that high at all.

My server is an older Intel i7 with 16GB RAM running Ubuntu22. 04 LTS.

How can I troubleshoot this, when ‘top’ doesn’t show any culprit and it does not seem to be caused by any one specific container?

(this makes me wonder how people can run anything at all off of a Raspberry Pi. My machine isn’t “beefy” but a Pi would be so much less.)

@duncesplayed@lemmy.one
link
fedilink
English
61Y

It’s a bit more complicated than that. System load is a count of how many processes are in an R state (either "R"unning or "R"eady). If a process does disk I/O or accesses the network, that is not counted towards load, because as soon as it makes a system call, it’s now in an S (or D) state instead of an R state.

But disk I/O does affect it, which makes it a bit tricky. You mentioned swapping. Swapping’s partner in crime, memory-mapped files, also contribute. In both of those cases, a process tries to access memory (without making a system call) that the kernel needs to do work to resolve, so the process stays in an R state.

I can’t think of a common situation where network activity could contribute to load, though. If your swap device is mounted over NFS maybe?

Anyway, generally load is measuring CPU usage, but if you have high disk usage elsewhere (which is not counted directly) and are under high memory pressure, that can contribute to load. If you’re seeing a high load with low CPU utilization, that’s almost always due to high memory pressure, which can cause both swapping and filesystem cache drops.

@ilega_dh@feddit.nl
link
fedilink
English
31Y

Using network storage to store your swapfile is one of the… um, more interesting ideas I’ve heard today

Adding to list of things to try when one’s network only knows 100Gb

Create a post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules:

  1. Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

  • 1 user online
  • 215 users / day
  • 438 users / week
  • 1.15K users / month
  • 3.85K users / 6 months
  • 1 subscriber
  • 3.71K Posts
  • 74.7K Comments
  • Modlog