It just happened again. I couldn’t ssh in despite the limit on docker resources, which leads me to believe it may not be related to docker or Lemmy.
This time it lasted only 20 minutes or so. Once it was over I could log back in and investigate a little. There isn’t much to see. lemmy-ui was killed sometime during the event
IMAGE COMMAND CREATED STATUS PORTS
nginx:1-alpine "/docker-entrypoint.…" 9 days ago Up 25 hours 80/tcp, 0.0.0.0:14252->8536/tcp, :::14252->8536/tcp
dessalines/lemmy-ui:0.18.0 "docker-entrypoint.s…" 9 days ago Up 3 minutes 1234/tcp
dessalines/lemmy:0.18.0 "/app/lemmy" 9 days ago Up 25 hours
asonix/pictrs:0.4.0-rc.7 "/sbin/tini -- /usr/…" 9 days ago Up 25 hours 6669/tcp, 8080/tcp
mwader/postfix-relay "/root/run" 9 days ago Up 25 hours 25/tcp
postgres:15-alpine "docker-entrypoint.s…" 9 days ago Up 25 hours
I still have no idea what’s going on.
I had the same thing happen. Max CPU usage, couldn’t even ssh in to fix it and had to reboot from aws console. Logs don’t show anything unusual apart from postgres restarting 30 minutes into the spike, possibly from being killed by the system.
You say yours solved itself in 10 minutes, mine didn’t seem to stop after 2 hours, so I reeboted. It could be that my vps is just 1 CPU, 1 GB RAM, so it took longer doing whatever it was doing.
Now I set up RAM and CPU limits following this question, and an alert so I can hopefully ssh in and figure out what’s happening when it’s happening.
Any suggestions on what I should be looking at if I manage to get into the system?
Here’s an update. I set up atop on my VPS and waited until the issue occurred again. Here’s the atop log from the event.
The culprit seems to be kswapd0 trying to move memory to swap space, although there is no swap space.
I set memory swappiness to 0 on the system for now, I’ll check if that makes a difference.