For infrastructure I control, or for 3rd party services we just wait for to come back up? For the former, nagios.
Datadog for our service, for 3rd party we look at their status pages 😂, we also got a slack bot setup that monitors their feeds
is that noisy? so many services have like 50 different statuses, only 2 are relevant to use generally
lol nods it can be
Pros of DD: it does all the things, Cons: It does all the things, is getting bloated and is expensive 😂
Im starting to become more a fan of modular tools than can plug into healthy ecosystems ie: grafana/prometheus instead of trying to do all the things itself
Been using 3rd parties such as datadog, sentry and uptimerobot . Pros: very straight forward cons: costyy