Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era.

Platform Engineering

Super interesting! thanks for sharing <@U03H8FWRC0L> :thankyou:

Curious how you use this in practice to address the goal you set
&gt; …help SRE &amp; platform teams prioritize the most valuable work…
As in, all of those things need to be done (without installing OTel collector, you can’t add logs to traces in the apps).

My understanding of Wardley map conversations is about how to move the bottom of your iceberg into things you can buy rather than build. So I would maybe expect to see examples like “use a hosted telemetry collector” or something like that there?

<@U03JY9VRBK4> great question - I’d make the case that capabilities below the water line can be _necessary_ to enable capabilities above the waterline, but those _necessary_ capabilities aren’t _valuable_ in their own right - they’re only valuable _after_ they enable capabilities above the water line. 
^^ this is probably the primary observation that the model is meant to convey, any other insights are probably just consequences of this assertion. 

Applied to the OTel collector example, maybe the team realizes that the operators need logs more immediately than they need traces (but can imagine a future where they could leverage traces as well)
• if the lift to install OTEL collector is equal to the lift to install a logs-only agent, the OTEL collector looks good b/c it enables an experience in the short term and _gives the option_ for building an experience in the long term
• If there’s the logs-only agent is much easier and faster to install than the OTEL collector, the logs-only agent looks good
In both cases the team assumes some risk until (e.g.) a viable log search experience is actually delivered to the user. But the “Iceberg” would caution against exerting any additional effort to enable traces if the team can’t point to the experience that the tracing plumbing will enable for the users.

Oh and I forgot to address a real example - our team provided a log search dashboard to users of a data workflow platform, and last year we noticed that the time-to-search latency skyrocketed. Further analysis revealed that increased log volume for _one_ framework (of many) was to blame. Our order of operations was:
1. Create a dashboard per framework to insulate as many users as possible (experience engineering)
2. Delete the <http://offendinglog.info|offendinglog.info> statement (signal mining)
3. Prioritize a structured logging architecture initiative to give us more levers to improve dashboard performance in the future (foundational plumbing)
1 paid off immediately for users of non-affected frameworks. 2 paid off in a sprint or two for users of the impacted framework. And 3 is still in progress, and will only deliver value to the dashboard users after we use structured logs to improve its performance - however we’ve since found other ways to leverage structured logs to power _other experiences_ , improving the likelihood that initiative will be worth the effort.

oh this is super interesting! I used to work as a product manager for an observability platform, now working in the API space and in charge of the cloud and platform ops - and of course very interested in the power of observability.

also I'm a fan of Wardley Map - so your post really got my interest!

Ah that is really great to share Cruise! Thanks for making the example so concrete. It sounds like it really helps push for focused prioritisation :clap: