Heyo!:wave: Any observability :telescope: platform...
# product-management
c
Heyo!šŸ‘‹ Any observability šŸ”­ platform product owners (or eng leads) out there? Iā€™d love to connect and learn what youā€™re learning! And Iā€™d especially appreciate help fine-tuning a mental model - Iā€™m calling The Observability Iceberg - that Iā€™m piloting with my team! (Look closely and youā€™ll probably see itā€™s just a one-dimensional view of a Wardley Map šŸ˜‰, but one thing at a time)
j
Nice work.
a
Super interesting! thanks for sharing @Cruise Hall thankyou Curious how you use this in practice to address the goal you set
ā€¦help SRE & platform teams prioritize the most valuable workā€¦
As in, all of those things need to be done (without installing OTel collector, you canā€™t add logs to traces in the apps). My understanding of Wardley map conversations is about how to move the bottom of your iceberg into things you can buy rather than build. So I would maybe expect to see examples like ā€œuse a hosted telemetry collectorā€ or something like that there?
c
@Abby Bangser great question - Iā€™d make the case that capabilities below the water line can be necessary to enable capabilities above the waterline, but those necessary capabilities arenā€™t valuable in their own right - theyā€™re only valuable after they enable capabilities above the water line. ^^ this is probably the primary observation that the model is meant to convey, any other insights are probably just consequences of this assertion. Applied to the OTel collector example, maybe the team realizes that the operators need logs more immediately than they need traces (but can imagine a future where they could leverage traces as well) ā€¢ if the lift to install OTEL collector is equal to the lift to install a logs-only agent, the OTEL collector looks good b/c it enables an experience in the short term and gives the option for building an experience in the long term ā€¢ If thereā€™s the logs-only agent is much easier and faster to install than the OTEL collector, the logs-only agent looks good In both cases the team assumes some risk until (e.g.) a viable log search experience is actually delivered to the user. But the ā€œIcebergā€ would caution against exerting any additional effort to enable traces if the team canā€™t point to the experience that the tracing plumbing will enable for the users.
Oh and I forgot to address a real example - our team provided a log search dashboard to users of a data workflow platform, and last year we noticed that the time-to-search latency skyrocketed. Further analysis revealed that increased log volume for one framework (of many) was to blame. Our order of operations was: 1. Create a dashboard per framework to insulate as many users as possible (experience engineering) 2. Delete the offendinglog.info statement (signal mining) 3. Prioritize a structured logging architecture initiative to give us more levers to improve dashboard performance in the future (foundational plumbing) 1 paid off immediately for users of non-affected frameworks. 2 paid off in a sprint or two for users of the impacted framework. And 3 is still in progress, and will only deliver value to the dashboard users after we use structured logs to improve its performance - however weā€™ve since found other ways to leverage structured logs to power other experiences , improving the likelihood that initiative will be worth the effort.
s
oh this is super interesting! I used to work as a product manager for an observability platform, now working in the API space and in charge of the cloud and platform ops - and of course very interested in the power of observability. also I'm a fan of Wardley Map - so your post really got my interest!
a
Ah that is really great to share Cruise! Thanks for making the example so concrete. It sounds like it really helps push for focused prioritisation šŸ‘