Hey folks I d like to get your opinions about a realistic si Platform Engineering #platform-culture

Hey folks, I'd like to get your opinions about a "...

Dirk Jablonski

04/15/2024, 12:03 PM

Hey folks, I'd like to get your opinions about a "realistic" size of a platform team. My difficulty is that we have actually multiple platforms in parallel, maintained by a single team. This often leads to problems with cognitive load in the platform team. On the other hand, splitting up the team would result in too small teams, which will be dysfunctional. What would you expect as size, supporting platform(s) for ~250 devs? thinking

Kathleen Simpson

04/15/2024, 6:36 PM

How many different platforms?

Chris Chandler

04/15/2024, 7:06 PM

I suspect you'll find alot of commonalities across the platforms/ecosystem. My guidance would be to think less about platforms at first and start with LEGO. Think about how can you build something that's composable that can be re-worked to address multiple use cases vs one end-to-end platform-specific solution. Force yourself to focus on one platform and one key cohort. You'll learn alot along the way and will be able to take those learnings into the next pass to pick up other use cases and platforms. Keep iterating until you reach that Pareto optimal 80/20 coverage (you'll always have folks blazing trails and/or resume building, so focus on the others first).

Dirk Jablonski

04/16/2024, 9:02 AM

Thanks @Chris Chandler, but I'm afraid there are not too many commonalities. Due to history, our platforms have not been started as platforms in the sense they are defined nowadays, but more like technical platforms. For example, we have central K8s clusters as one platform, and serverless (AWS Lambda) as another. Others partially build on those, using them as foundations. There is no capacity & plan at the moment to re-iterate the existing platforms, so we will need to incrementally improve here. To even be able to tackle these improvement, I feel like we would need more workforce, and therefore the original question.

Dirk Jablonski

04/16/2024, 9:04 AM

@Kathleen Simpson Roughly 6-8 (depending on definition). Some of them could maybe be considered just components of a larger one, but still the specific parts require deep knowledge, so burdening the whole team / a single team with all of them is cognitive overload imo.

Chris Chandler

04/16/2024, 2:42 PM

Dirk: Understood. Sounds like you're - rightfully - overwhelmed with the scope of the solution space, which is kind of my point. Don't solve for all of it at once. Pick one and iterate. Which of those would be the best to start with (# of workloads, willingness of the community to align on a common solution, etc)? Pick one, iterate, roll the learnings (and bruises) into the next iteration, etc.

Kathleen Simpson

04/16/2024, 4:06 PM

@Dirk Jablonski Are you saying 6-8 different platforms or different accounts? I’m kinda confused as to why you would need 6-8 different platforms.

Leo Epstein

04/16/2024, 8:35 PM

Hi Dirk, I brought this question up to my colleague, Laura Tacho who's doing a webinar on structuring platform teams with Manuel Pais of Team Topologies later this month. I can't guarantee it'll be answered because of the nature of the event and time, but I'm thinking they'll be able to cover it. I can drop a link to the event, or the recording after the fact if you'd like that too.

Kathleen Simpson

04/16/2024, 9:08 PM

@Dirk Jablonski The way I would look at it is how much of the percentage of time do each of the platforms/components take on average? The total time is 100%. Let’s say you have a large enterprise system with Azure, AWS, GCP, RHEL, K8s, and Oracle. You can automate a lot of the stuff handled on Azure & AWS. GCP can do the automation, but in my experience, it tends to have issues more often than Azure & AWS. RHEL is fairly stable and I would put it on the same level as Azure & AWS as far as the amount of time required to maintain or a bit more time involved, but not tons. Most of the time, the folks handling Azure can handle AWS if they have correlated the product/configuration names. Also, if you have folks that are certified on AWS/Azure, part of the training is also K8s and containers, so they should have a handle on that. From that, it’s going to matter how large each system is and then figure the admin time as a percentage until you have 100% of the time expected (x) for maintenance. Once you have the percentages, you can simply divide by hours in a week, etc. Honestly, the hardest part will be figuring the percentages. If you can get a Solutions Architect in there that has a couple of your major components under their belt, they should be able to get you a plan in a reasonable amount of time. It is likely they will want to change some things on the system to make it more manageable, but that will help you. What you are asking is literally their job.

Dirk Jablonski

04/16/2024, 10:56 PM

Thx @Kathleen Simpson 🙂 I'm afraid there is no Solutions Architect for this here, and the job is actually mine as a Product Owner (with software & architecture background), and the part of "changing a few things to make it more manageable" is what I strive for, but I feel that we don't have enough resources for maintenance, at least minimal improvements for our devs, and restructuring the platforms. That's why I asked what you think is a good team size, to get a feeling if it's just me 😉

Clemens Jütte

04/17/2024, 8:05 AM

Hmmm… I have now slept over this one and am still confused of what you really want to achieve or know. Caveat: it’s most probably just me and that I am used to use a certain lingo to describe scenarios. 1. I am not sure the question of “how many people do you usually need for X” can be answered without knowing a ton more about X than you have told. 2. It seems like you’re describing runtime environments as platforms - I would sort that differently. The reason behind that is, that you strive to build one platform that unites as much as you can between runtime environments. This reduces maintenance efforts for the platform team and also reduces cognitive load for the platform users, as they don’t need to learn all the specifics of every targeted runtime when using the platform. 3. Running any runtime environment behind a platform front is going to incur maintenance work. The only healing is automation and standardization. You should strive to automate the standard maintenance processes as much as possible so your team only needs to care for the odd cases. This means reducing the possible variants - every invariant a platform user can create that is a special case will probably incur bespoke maintenance efforts (my best guess is, that THIS is the actual problem tying your people down. The sheer amount of “maintenance and help requests” from your devs is most probably demanding that everybody is a specialist in everything as well.) Happy to share a virtual platform coffee and see if I can spot some patterns in how you’re handling things.

Dirk Jablonski

04/19/2024, 8:45 AM

Seems that the question cannot be answered with more details than I can share at the moment 😕 Anyways, thank you to all of you, some of your responses still brought me insights that hopefully help solving my challenge thankyou

Samuel Lijin

04/21/2024, 5:33 AM

answering purely the "platform team size for ~250 eng" question, my wildly uninformed guess (i've only been in orgs far bigger and far smaller than 250) would be 20-25 people. https://engineering.atspotify.com/2020/08/how-we-use-golden-paths-to-solve-fragmentation-in-our-software-ecosystem/ might be of interest to you about how to prioritize

Gerald Benischke

05/03/2024, 8:56 AM

Let a 1000 flowers bloom is a good story in this context IMHO

Open in Slack

Previous Next