We do Grafana + Prometheus for most of our clients but I think that adding Loki into the mix might be necessary. The amount of clients that are missing basic events like “you’ve run out of disk space…two days ago”, is too damn high.
Sounds like you need an alert/monitoring system and not a logging system. Something like nagios where you immediately get an alert if something is past its limits, and where you don’t have to rely on logging.
I would add Alertmanager to your stack if you haven’t already. It’s pretty tightly integrated with prometheus. There’s some canned alerting rules based on predicting disk space full in X number of days. We wire Alertmanager to Pagerduty.
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !programmerhumor@lemmy.ml
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
Posts must be relevant to programming, programmers, or computer science.
No NSFW content.
Jokes must be in good taste. No hate speech, bigotry, etc.
We do Grafana + Prometheus for most of our clients but I think that adding Loki into the mix might be necessary. The amount of clients that are missing basic events like “you’ve run out of disk space…two days ago”, is too damn high.
Sounds like you need an alert/monitoring system and not a logging system. Something like nagios where you immediately get an alert if something is past its limits, and where you don’t have to rely on logging.
Preaching to the choir. They hire use to performance tune their app but then their IT staff manges to not notice the most basic things.
For my personal servers, I use Netdata for this. Works pretty well.
Still don’t know how to offset my time on the graph but besides that I find just complicated enough but not too much
I would add Alertmanager to your stack if you haven’t already. It’s pretty tightly integrated with prometheus. There’s some canned alerting rules based on predicting disk space full in X number of days. We wire Alertmanager to Pagerduty.