Hi <@U04N3A39KFC> You don't mention how large your...
# general
j
Hi @Dolev Algam You don't mention how large your company/team are, but there is a fundamental break in code responsibility when you pass the ~8 person size (aka a one pizza DevOps team). Beyond that size you end up starting to have people not doing both dev AND ops, but starting to focus on one or the other. If you already have a platform group, I suspect you've passed the one pizza threshold. If that is the case, platform operations (including finding and fixing bugs) should not be your application developers' roles- it now belongs to the ops team. The developers should actually be "dumb" regarding your infrastructure. If they need to understand your infrastructure, you are shrinking the business value they can provide. Your app devs should ONLY be focused on improving your customer-facing application. If your app devs are trying too do both application AND infra development, they are not focused on what is important. The "platform group," on the other hand, should own the platform. It is their responsibility to provide a bug-free and functional platform to the application developers. If you have "super developers" that understand your infrastructure and applications, the maybe allow them to help with infra, but I'd say that should be anexception rather than rule.
r
check number 3 here for a relevant diagram https://www.getport.io/blog/guide-to-internal-developer-portals
k
I'd argue that "dumb" is an over simplification. They should understand the platform at a bird's eye view, otherwise you get too much of the "works on my machine" syndrome. Software works very differently at scale, but you are right developers shouldn't be developing the app or vice versa and don't need to know the minutia. However principles/architects on both sides should have a deeper understanding of the boundaries to ensure issues in production can be handled appropriately
r
I pretty much agree with John, for a larger org, your devs need to focus on building world class apps, as others have said, being dumb to the infrastructure isn't ideal, even though your platform is self service, your devs still need an understanding of what your IDP is doing. You dont want your devs picking cloud services "a la carte" as it costs too much, thats why your IDP has the golden paths built in. Your IDP will obviously be delivering services in a way that makes sense but your devs need to know whats going on. Making an IDP "blackbox" is a bad idea.
s
I had someone on Twitter accuse me of not understanding DevOps, when I said, devs working on infra should only work on infra and devs working on applications, should only work on applications. And DevOps entails both of these types of devs. Dev = application devs Ops= infra devs The Ops team's role encompasses the dev team's role. And, theoretically, we could argue about semantics, but that is how I see the roles with this image:
I understand too that with a small team, this big red line in the sand will and probably must be crossed, due to the lack of manpower or due to developer knowledge. But, at some point in scale (I like the over one-pizza-sized team), there needs to be more discipline in making the red line stick. To me too, at some point for an application dev, the infra should be basically unknown to him or her. Yes, the dev should know what resources and services they can "connect" to and use. They should be able to find docs on those services and their APIs easily and easily configure their usage, when needed. Where, how, why, with what the "connection" to these services all happens for them shouldn't be a worry at all to them. It should just happen. To me, that is what a platform should offer. That being said, once devs are pushed out of the ops side of things, working with the platform may seem "stiff" or "inflexible". This is inevitable, because a lot of decisions are made up front to build the platform. Applications devs need to accept this rigidity as normal and do their work as best as possible. Only the process flow of development should be considered as a point of improvement, where they discuss it with the Ops team.
Is a service missing? Can a certain step in development flow be simplified or more efficient? Is certain knowledge not available for getting certain work done? Can the movement to the step be made more intuitive? Is moving to a dev workflow step not automatic or can it be made automatic? If not, why not? Ect, etc.
That's the applications dev's devOps role. Then there is the application developer role, where they must discuss the business logic with the business. And, in most cases, a lot of the same questions above are asked, but from the business' process workflow perspective. 🙂
j
@Kyle Campbell "Dumb" may be a bit of a strong word, and I'm not against a developer having some understanding of platform functionality, but I am against the needing to understand it to do their job. My goal when helping a company develop or tune a platform is to reduce developer overhead and improve business value. It is possible that a company is at a platform development stage where operational things like, for example, VM type are indeed in the control of the developers, but ultimately, this is just a symptom of a platform that has not been refined to the point where VM size selection has been automated. Devs should focus on building the customer facing functionality of the application and ideally not need to think about " will it run, scale, be resilient?" For operations, the customer is the developer (and yes also the software user by extension). There should be a constant dialogue between developers and operations to understand requirements, but being able to then automate processes to meet those requirements will ultimately provide maximum business value.
@Richard Brown Totally agree that there need to be guardrails. The classic example of why is that is whenever we allow students to create VMs for the first time ( and this is explicitly in the context of creating a VM that will be deleted) 99.9% of the VMs will be the largest possible option. The "what if" mentality is a legacy of old school resource management and misses the point of cloud-based automation. I would be interested in learning more about why you think 'black boxing' your IDP is a bad idea. The IDP needs to provide the necessary functionality to your developers (with appropriate limits), and your ops team needs to be aware that your developers are their customers and need to add functionality when requirements change. If the ops team is appropriately serving your developers, your devs should be content with a "black box" IDP solution.
@scott molinari Thank you for your insights. I think you are looking at "Dev and Ops" for a team that has already reached the point where there actually is a platform serving Devs and customers (production). It thing the DevOps engineer is indeed someone that can manage the infrastructure as well as the customer-facing application code. When you get to the point where you have developers doing infrastructure-as-code and you have developers writing application code, this is no longer (to me) DevOps. It might be agile and quite likely GitOps, but you now have, as you noted, developers focused on infra operations or app development. This above is perhaps semantics. I think your red line is correct, though I would say that the pizza team size DevOps becomes not DevOps when it crosses that line. Crossing that line requires a platform because not all of you team is able to do both Dev and Ops. While I'd move away from the DevOps terminology for larger teams where a platform becomes necessary, I do agree with your points about how the platform interaction work for the Devs and Ops. And yes, if you are doing you platform right, you Ops team are really developers managing infrastructure as code.
s
I slightly disagree...I think? DevOps is the whole process of application development and delivery. It is always there. There are two clear roles within the process. The application dev and the infra/ automation dev/ admin. When the team is small, a single dev might work both roles and he'd be called a DevOps engineer. When the team gets bigger, the split between the roles should be more and more predominant and you'd only have Ops engineers (whereas one could argue their name is DevOps here too, since their work envelops the other's) and application engineers. I think in the end, we would agree on that overall? 😁
k
Having worked at both the very large org scale and very small org scale, I guess I still disagree somewhat with this statement. “Devs should focus on building the customer facing functionality of the application and ideally not need to think about ” will it run, scale, be resilient?” This falls back into the old IT model of devs build software and throw it over the wall at IT folk. I feel it’s not possible to build good software without understanding the concepts of run/scale/resiliency. They don’t need to know the details, but devs do need to know the concepts, otherwise they will unintentionally introduce very inefficient things that may require really expensive ops solutions, when a change in code may have saved that. Likewise, software shouldn’t be a total black box to ops, they should have some idea of what it’s doing under the covers
Nobody needs to know the “how” to implement the other job, but they need to understand high level concepts
It’s similar to the interface between dev and UX, a developer doesn’t need to know the details of how to design a good experience, but they should be able to follow a design language without being told how to implement every single minor detail, and have some understanding of why things are the way they are.
In my opinion, it’s this boundary area between ops and dev that where its more art than science and it’s this part that’s really hard to do right and where people stumble, even with the best laid plans and strong rules on component templating, etc.
s
Ironically enough, when the DevOps roles are definitely split between different teams, and "the devs" are given a consistent workflow to develop apps and/or services with, the Ops devs, the ones responsible for that workflow, could be called "Platform Engineers". 😛 😄
j
@Kyle Campbell I generally agree with you at least when dealing with more traditional development models. Applications running on bare metal or VMs, where dependencies were not canonically defined could cause problems when developers threw their apps over the proverbial dev to ops wall. Containerization has changed that. A container that works on a developers laptop should work in production. If it does not there is a fundamental flaw in the development platform. A properly implemented cloud platform should provide appropriate resources for developers to run their apps and have equivalent behavior in dev/test/prod. This is not to say there might be limitations that a developer discovers and needs the platform team to implement, but it should not be the developer's problem to fix it.
k
Oh no, I didn't mean the developer should fix it. It is the platform team's responsibility, they own that code, but devs do have accountability in understanding what happens when things go into production And, I'd generally agree with true microservices generally the same container runs the same in dev/test/prod. The challenges I hit daily and colour my opinion are A) developers not adding proper telemetry or logging into their code, either too little or too noisy. These don't bother a developer on their machine but cause me huge headaches B) Understanding where bottlenecks/root cause of failures can be hard to figure out with a complex mesh of microservices all talking to each other. This one in particular requires both ops and dev to solve usually, and can be hard for ops to understand new issues as the relations between services can change frequently C) I've never been lucky enough to have a truly clean perfectly architected microservice environment. There's always legacy monoliths running on VMs somewhere intertwined into the newer stuff or "megaservices", which are containerized mini monoliths that do too much. Megaservices usually started out as microservices but kind of grew and became ungainly over time
Megaservices usually result from time pressures on devs, things are tacked onto the service because there's not enough time to architect and spin out a new microservice
These are all the places where the platform in and of itself doesn't solve for me yet. It's all automated, but it's not necessarily correct. AI does potentially have some application here, as it might be able to discover these kinds of issues that aren't currently detectable in the pipeline and report on them
e
In certain organization size, R&D groups have the requirement to hold Operational knowledge. SRE representatives that handle ops work and have it related to group daily work. Most of the developers can be the ones focusing on code ("dumb") while other group members are responsible to performance, resiliency, architecture and generic operational work. When it comes to large scale organizations, Platform teams are responsible not to the actual infrastructure, but to enable infrastructure usage and provisioning. Looking at the Day 0-1-2 operational requirements - R&D are responsible for Day 0 and Day 1 operations while Platform are responsible to Day 2 operations (hopefully with automation). The way that I envision Platform in large scale organizations is Platform write the tools (e.g. a tool like Terraform with DSL) while the R&D is doing the dev-ops part by declaring their infrastructure requirements. It's an evolution of tooling and required knowledge. In the early days, EC2 was considered developer friendly, while today the entire stack is abstracted up to Fargate. Who knows what will be next? 🤓