Slackbot
12/12/2023, 2:48 PMAndrew Fong
12/12/2023, 2:55 PMAndrew Fong
12/12/2023, 2:56 PMOskar Mamrzynski
12/12/2023, 3:00 PMAndrew Fong
12/12/2023, 3:12 PMAndrew Fong
12/12/2023, 3:12 PMAndrew Fong
12/12/2023, 3:13 PMAndrew Fong
12/12/2023, 3:13 PMBryan Ross
12/12/2023, 3:22 PMOskar Mamrzynski
12/12/2023, 3:24 PMBryan Ross
12/12/2023, 3:27 PMBryan Ross
12/12/2023, 3:29 PMOskar Mamrzynski
12/12/2023, 3:37 PMBradley Sickles
12/12/2023, 3:50 PMJordan Chernev
12/12/2023, 3:59 PMClemens Jütte
12/13/2023, 10:38 AMHey Oskar!Two observations upfront and then I’ll try to step through your questions - caveat is, that I don’t fully understand your situation, so please consume with a pinch of salt. 1. You use the acronym IDP, but for me it’s unclear what you’re actually after. You reference backstage, which indicates your looking for a dev portal but usually the ‘P’ is interpreted with ‘platform’ here - you’re in the “platform engineering” Slack ;-) 2. You mentioned that two developers who used backstage in their previous company was the inception point of you investigating. I am guessing here, that this experience is still fresh, as in they’re recent joiners of your company. It might be, that for you, the whole tooling, experience and process is crystal clear and easy to understand - but after all you’re using it daily and also at least partly created it. For someone new to the company it might be completely different and hard to understand - hence their want for a portal as an abstraction layer to make this more easy to understand.
I like the idea of service catalogs, but who would keep it up to date? Would we need to maintain a list somewhere with all the relevant links, or autodiscover? If the latter, how do you filter out things which are not services but just utils? E.g. our team has a lot of util agents repos, but none of them are production services.You can think of “the catalog” as a data warehouse. There normally is an ingestion process that will collect all relevant information. How this works is dependent on the product you’re using. For backstage you would have a YAML file stored along the sourcecode in the repo and backstage would ingest that file periodically and update the catalog entry from there. Different types of entities can be distinguished easily in the catalog, so you could ingest both and easily tell services from utils apart. Depending on the product you use, you can even understand and visualize the dependencies between catalog entries (and their corresponding source code repos).
How is service catalog different from a good Confluence page with all the necessary info + links?I am not going to dive into the religious aspect of that question. To state the plainly obvious: if you’re using a catalog, the infos can be maintained by the devs in context and in an abstract format in their IDE. Keeping everything in the right formatting and up to date is then cared for by automation so it can be visualized nicely. Confluence is only caring for the “can be visualized” part.
Would service creation workflow create the repo, with all the default files, terraform, pipelines etc? Would it deploy a basic dummy app to somewhere?There is no simple answer to that, as normally, you would configure the scaffolding process to carry out what you want it to carry out. If you want it to deploy a dummy app somewhere - make it! You want it to be deployed by an already included pipeline to somewhere - include the creation of the pipe in the process. You get the idea...
Would IDP be able to support teams that use repo per service vs teams that use monorepos?TLDR - yes ; Depending on the product you use, this is easy or hard. You enter the realm of “what would a platform support more easily over just a portal” very quickly here.
What about resource creation of “protected” resources, e.g. DNS and Firewall rules during service creation? These assets are done via Terraform where my team has to review PR changes. Should devs be able to just create things without review, automatically? Or should the IDP just raise a PR for us to look at?There is a recurring theme with “depends on the product you use” here. Good platforms will let you use your TF modules but abstract them away behind a layer that is more accessible by developers. Usually there should be nothing to configure for a developer for e.g. a DNS record - it really should be computed from the service name and your domain name being entered into a simple template. Firewall rules are the same - the only thing a developer should be reasoning about is, if his service is a publicly exposed one or internal - depending on that information the right FW config should be generated. This eliminates the need for a PR altogether as the generated config cannot go wrong - “developer self service” is the keyword here.
How do you manage changes to services after service creation? If teams want to e.g. update their health probes or replica count - they can currently just go to Terraform we made for them and update it. How would an IDP help here?“Go to the Terraform” is usually not what a dev wants to do or hear. Once again, I think it’s about the abstraction for that interface - the wish for a portal in front of that, which makes it more accessible and reduces the possibility for error. You could have a look at Score (http://score.dev) for inspiration.
If we come up with an improvement to golden paths, e.g. better Terraform modules - how would IDP help us apply it to existing infra? Would we have to re-create the service, or would it try to apply on existing? What about conflicting changes?Simply put - a good IDP (platform!) abstracts the TF away for the developers. You evolve it disconnected from them in the platform team and as soon as you’re ready for rollout, the IDP helps you to centrally manage that rollout to the projects in a sensible way - e.g. stage and group based rollout - dev of test teams first, then dev for all followed by staging for all and finally prod for all. While doing that, your IDP should make sure that the next deployment the dev calls for is executed against the new TF module version. How the state is managed and what the update policy is, should be encoded into the TF modules as this may vary per resource you manage (which is exactly the cognitive load you want to get away from the devs by abstracting them from the TF).
How can you restrict that only specific set of people in a team can create services? Currently this is done with rights on Terraform repo.Any credible IDP I know of (and here you can let the “P” be platform or portal - doesn’t matter) supports RBAC. Happy to have a virtual coffee together if you want more specific feedback on any of those points.
Abby Bangser
12/14/2023, 7:39 AMOskar Mamrzynski
12/14/2023, 8:35 AMClemens Jütte
12/14/2023, 1:09 PMUsers being frustrated by the wrapper limitations but the wrapper creators not having time to extend or desire to meet “every edge case”This looks a bit fabricated, as it can be true for your modules as well and can be extended by not providing enough capacity or willingness during PR review. Platforms should provide a clear contract, which includes the possibility to opt out. If you want to transcend that concept to the golden path metaphor, users should be able to opt out of using a golden path completely or partially, depending on their needs. The reasoning behind that is simple - you should never assume that a platform (including a portal, which is usually one of the components of a platform) can solve 100% of all cases. But if the platform can extend its benefits to 80% of your org (the devs who want to have their cognitive load reduced, for whom the abstractions work and who don’t want to indulge in manually solving repetitive tasks), you might get enough capacity to care more for the rest of the 20% who are not on the platform and everybody wins, because overall you’ll be delivering value quicker and with better quality.
Oskar Mamrzynski
12/14/2023, 2:10 PMAndrew Fong
12/14/2023, 2:15 PMAndrew Fong
12/14/2023, 2:16 PMAndrew Fong
12/14/2023, 2:16 PMOskar Mamrzynski
12/14/2023, 2:17 PMAndrew Fong
12/14/2023, 2:17 PMAndrew Fong
12/14/2023, 2:18 PMAndrew Fong
12/14/2023, 2:19 PMAndrew Fong
12/14/2023, 2:19 PMClemens Jütte
12/14/2023, 2:20 PMAndrew Fong
12/14/2023, 2:21 PMAndrew Fong
12/14/2023, 2:21 PMClemens Jütte
12/14/2023, 2:22 PMAndrew Fong
12/14/2023, 2:23 PMAndrew Fong
12/14/2023, 2:23 PMAndrew Fong
12/14/2023, 2:24 PMClemens Jütte
12/14/2023, 2:26 PMAndrew Fong
12/14/2023, 2:27 PMAndrew Fong
12/14/2023, 2:27 PMClemens Jütte
12/14/2023, 2:28 PMAndrew Fong
12/14/2023, 2:29 PMAndrew Fong
12/14/2023, 2:29 PMBradley Sickles
12/14/2023, 2:29 PMAndrew Fong
12/14/2023, 2:30 PMBradley Sickles
12/14/2023, 2:30 PMAndrew Fong
12/14/2023, 2:31 PMOskar Mamrzynski
12/14/2023, 2:31 PMAndrew Fong
12/14/2023, 2:31 PMAndrew Fong
12/14/2023, 2:33 PMAndrew Fong
12/14/2023, 2:34 PMOskar Mamrzynski
12/14/2023, 2:34 PMJordan Chernev
12/14/2023, 2:35 PMAndrew Fong
12/14/2023, 2:37 PMAndrew Fong
12/14/2023, 2:38 PMClemens Jütte
12/14/2023, 2:38 PMClemens Jütte
12/14/2023, 2:47 PMBradley Sickles
12/14/2023, 3:41 PMThis is also why I don't agree with abstracting things away too much for devs.@Oskar Mamrzynski I really like leaning into this idea. There is a good balance for devs. To some extent, a developer says "I just want a container app with postgres and redis. I don't want to babysit it, just go". However, when/if things go wrong, what will it take to get back on the happy path. In the Internal Developer Platforms I've built, the one strategy that we employed that had massive impact was a heavy focus on curation. Most of this comes down to infrastructure modules (e.g. Terraform, Helm, <insert other tech>) • Put yourself in the mind of a developer that isn't a cloud/k8s expert. How do I know which modules to use? • What ways can this go wrong for a developer? Are there good error messages that allow a non-expert dev to course-correct? • Have we built automated testing so that when someone makes a change to IaC, it doesn't ripple to the user? • When we roll out changes to a developer, do they know how to upgrade without causing downtime? I wrote a high-level piece on Terraform curation earlier this year on this topic: https://www.nullstone.io/blog-posts/terraform-self-service
alex george
02/23/2024, 6:42 PM