My org is looking to consolidate observability too...
# observability
d
My org is looking to consolidate observability tools and it's looking like the Grafana stack is the leading candidate. We currently use Datadog for APM, Splunk for bulk logs, AWS Cloudwatch Logs, AWS Cloudwatch metrics, Azure Monitor, Kibana and Elastic for web analytics.. and the list goes on. Has anyone had experience consolidating systems to a single tool? Any tips/advice on Cloud vs Self hosted based on scale and support?
a
We offer complimentary setup with prometheus + grafana + nginx as standard, when we have customers use externally integrated k8s clusters into our developer platform. We also offer and assist with training on how to use these observability tools + setup alerting. https://www.zeus.fyi/pricing
s
You have to options consolidate on the presentation layer or the processing layer. In the first you need dash boarding which supports all your tools e.g. Grafana. On the second you need something like OpenTelemetry Collector which can receive and send in the necessary protocols to send to your tools. I prefer the second approach as it allows to pivot down the line to different tools.
d
OTel collection is an early line item. All of our instrumentation is either done by an in house built Distributed tracing library (which is old but pretty darn good) or using Datadog APM instrumentation. Most of our logs are heavy forwarded from the source to Splunk cloud. We need to take a good look at which of those logs are actually useful. Probably 50% or less
The big thing is standardizing our approach across 1800 engineers and hundreds of independent services. Need to develop a smooth path
a
The only way to do it well from my experience is to literally plan a lot of details and do a lot of pre-work in organizing upfront, then everything else is 100x easier ^ prototype the new tech/move ^ classify by cloud, server, region, service. group by easiest/lowest disruption cost ^ review major service groups w/most sophisticated needs to make sure they’ll be met ^ get feedback in migration plan from teams ^ schedule + notify teams ^ rollout changes by relevant groups
d
Good advice, Thanks. I appreciate that.
a
yw!
l
Heyo If you want to consolidate, you can't go wrong with going full Elastic stack... It's not perfect, as there are alternative tools in the market that would excel Elastic as certain things, but it does a hella good job of keeping everything under the same hood
Our devs go to kibana and simply have everything there... APM, Logs, Host metrics, alerts, dashboards, SLOs, uptime... From someone who had to jump around three to four tools to get all the data, it's a dream
d
The common theme seems to be whatever the tool is: use one. Trying to evaluate if Grafana is right for us
I will definitely do some homework on Elastic and Kibana though. Thank you
a
imo prom + grafana stack is basically free and all the other features you pay through the roof for from observ cloud saas is almost never used, and comes with surprise $M+ size bills
l
Well, if you're going to make an investiment into trying to consolidate tools, you should try to make the most out of it... The cool thing about elastic is that you can have a mvp with the open source versions that have most of the basic functionalities without having to commit a huge budget. Plus if you're really into it, you can self host it
a
elastic search is notoriously complex to setup yourself imo. grafana + prom k8s stack comes with default k8s dashboards for resource monitoring and theres a massive ecosystem of free high quality grafana boards for most applications
l
I agree completely, the elastic stack will be harder to setup (if you don't use the SaaS offer). However there's only so much you can do with grafana+prom on k8s... I might be wrong but I think you'll still be missing APM, endpoint monitoring, integration with other components outside of k8s, etc...
d
@Dylan Justice , I can recommend you to have a look to Dynatrace. Happy to put you in contact with someone.