Hello from South Africa :wave: Our team is thinki...
# terraform
d
Hello from South Africa 👋 Our team is thinking about how to make it easier for development teams to use terraform to bring up cloud resources in a way that they can control the things they care about, and not worry about the things that they don't. The set of things that "they don't need to care about but the platform team should" isn't particularly well defined at the moment, but I think we'll figure out where that boundary lies as we walk down the road. Simply put, we're trying to achieve for terraform what helm did for the kubernetes ecosystem. How to go about doing that though is complicated. We've been toying with a couple of things - a templating solution; trying to build modules intelligently; leaving everything up to the developer but running an aggressive linter over their plans Wondering if anyone else has been trying to do something similar and what the experience has been or if any ideas pop to mind for anybody?
m
Hi Delano, nice meeting you! At Bose Corporation we did something similar and we wanted to make IaC a real thing amongst our developers. In the past the majority of the infrastructure development teams owned was provisioned manually, the majority of the infrastructure managed by one or more dedicated cloud teams was a mix of manual work vs scripted vs IaC (Terraform, CDK, Cloudformation, ...) We have built a robust wrapper that references and/or incorporates a number of terraform modules who can be toggled on or off depending the developer needs. The terraform modules are distinct sets of capabilities with automatic documentation generation, versioning system, metadata registry and a whole range of default, recommended or required configurations. All inputs of the modules can be set through the wrapper but if not set the result will be provisioned with default values which is our way of creating golden paths. We have built a CLI and API developers can use to get their copy of a wrapper stack in a dedicated code repository based on the details they’ve provided. The API is kept up-to-data automatically based on the metadata we store of each module. Example: If a developer needs a VPC and an EC2 instance he would toggle on the VPC and EC2 module and the EC2 instance will be provisioned nicely in that newly created VPC. If the developer already has a VPC and just wants to have another EC2 instance, he would only toggle on the EC2 module and has to provide the existing VPC ID.
d
Thats pretty cool! Thanks Maarten. How do you guys deal with updates to infrastructure through the CLI?
m
The CICD pipeline shipped with the code repository you get when requesting a new stack, is the only way to update the corresponding cloud resources. The wrapper applies pessimistic versioning for the terraform modules so every pipeline execution would apply up until the latest patch version of every terraform module. Currently minor and major version can be updated directly in the code repository only but we have something on our roadmap to push or handle those updates through the CLI or API which would in essence do the same thing (updating the code repository and using the pipeline to cary it out). Similar to the metadata we collect when building the terraform modules, we also collect metadata about the stacks that were created so we know at any time which stack uses which version and theoretically we could prepare and initiate merge request or change request in the developer’s code repository automatically when updates are required.
t
My low-tech solution for our tiny org is to put all the settings in
<http://locals.tf|locals.tf>
and let devs make pull requests. They don't really have to read or understand the rest of the HCL. But they are empowered to adjust a setting relevant to them
b
Hey @Delano Ramdas! I built an IDP at McKinsey in 2016 that took an approach similar to Maarten Fuch's (provide a set of pre-vetted services to connect). It was wildly successful because it was a simple tool for developers that solved the bulk of the use cases. However, we faced two major problems: 1. There were several use cases we couldn't serve well -- many teams chose to do everything from scratch 2. We had a hard time keeping up with the requests for new services, cloud providers, etc. (we were serving many diverse clients with their own tech stacks) I wrote about some common issues I've seen with TF self-service here: https://www.nullstone.io/blog-posts/terraform-self-service
Whatever approach you go with, curation is critical. • How do developers find what they need? • What happens when TF goes sideways? • How do you push upgrades and bug fixes to each module?
a
I really appreciated that blog @Bradley Sickles. I think at the end of the day, what we are looking for is resources on demand, not terraform. If we are offering code as a service, we will run into the issues you detailed (extremely well!). What stops us from creating APIs that can be implemented with terraform but that doesn’t require app devs to care if it is Terraform, Crossplane, Pulumi or Tofu 🤔
b
Hey @Abby Bangser, Delano explicitly asked about Terraform, did I miss something?
a
Absolutely. I am not suggesting to move away from Terraform as the implementation! I am saying that answering the question
how to make it easier for development teams to use terraform to bring up cloud resources in a way that they can control the things they care about, and not worry about the things that they don't
may benefit from thinking about if them using HCL directly should be under review.
IME as an app dev, I care a heck of a lot more about my DB connection string than pretty much anything else. I need to be able to tweak settings sometimes, so I don’t want a black box I can’t configure. But I also don’t care about (and mostly don’t want to manage) how it goes from config to useful.
m
@Bradley Sickles I love your blog post, great insights!
b
Thanks @Maarten Fuchs!
k
Worth noting that Hashicorp are introducing a feature called stacks, which targets this concern: https://www.hashicorp.com/blog/terraform-stacks-explained
h
@Delano Ramdas We have a catalog of low-code (Terraform) patterns that are aimed at developer and platform engineering teams. They are specifically for the type of use case you described; you want to use IAC but you don't want to have to configure 27 different things just to spin up a simple DB, VM or container. Happy to give a demo over a virtual coffee if it's something you think might be useful.
g
If you are open to trying a different framework - something like Pulumi - then Pulumi with Backstage is a great option to explore. The developers don't need to know how to spin up resources or work with the Cloud (or even know which Cloud they are working with); they can drag and drop resources they would like to spin up. It is easier said than done. It requires inheriting the Pulumi resource definition and defining the standards of how your organization would like to spin up the resources (like default, security standards, etc.) and integrate with Backstage and CI/CD to deploy the resources. It's a complex workflow but entirely scalable as you grow. It's worth it, but it takes a lot of time to get the initial version out and needs constant management to add more and more resources into its approved resources pool.
If you are open to exploring paid tools - then Humanitec is a great option - https://humanitec.com/
a
@GP, in your solution with pulumi and backstage, how much of that experience is backstage and how much pulumi? Like could you create that same experience with terraform and backstage?
g
@Abby Bangser - yes, you can also create that experience with Terraform and Backstage. The learning curve of Backstage is minuscule compared to how much you need to invest in Pulumi (or terraform) and CI/CD to create this model. Backstage's crucial goal is to enable plugins to support other developer frameworks and platforms. Pulumi has official Backstage Plugin support. To learn more, refer here - https://www.pulumi.com/blog/pulumi-backstage-plugin/. Terraform doesn't have such official support, but you will find equally good Plugins built by the OSS community. That's where I think Terraform sucks these days, as they are steering away from the OSS model and focusing more on profits, and their development model aligns more with their Cloud offering.
After a bit of a search - I'm not able to find a good TF Backstage Plugin 🤷🏾‍♂️
So, yeah, you will also end up creating that Plugin for Terraform - if you choose to go with Terraform.
h
I would like to challenge that thinking a little guys. There are several good ways of integrating Terraform with Backstage if that is what you want to do. If I take it back to the topic from yesterday's webinar where Krzytof talked about resources vs capabilities I believe that selecting Pulumi would remove opportunities to increase adoption of what you build (capability) simply by choosing a tech that limits what it can manage. If I go back to the question that started this whole thread it was to help find/select something that would enable developers without getting in their way, specifically for cloud. While there is much tech out there that can build and manage cloud resources, the only one that has day-1 support for new (cloud) resource types is Terraform. Why? because the cloud providers help ensure it's ready to use by teams like us. There absolutely is an overhead when adding Terraform to the mix as others have said. The real question on Terraform is; do you want to write Terraform (40% of your day) or use Terraform?
a
There absolutely is an overhead when adding Terraform to the mix as others have said. The real question on Terraform is; do you want to write Terraform (40% of your day) or use Terraform?
This seems to be back to the main question. Though I may suggest extending from writing Vs using. To writing Vs using Vs benefiting fun the outputs. That's where I absolutely would not recommend rewriting terraform in another language, but I do wonder about adding abstractions between app Devs and infra implementation.
g
I'm not promoting Pulumi over Terraform and vice versa. Again, back to the original question - how do we enable the developers to write Terraform to bring up Cloud resources. I recommend taking a step back and seeing what we truly want to accomplish here. Is the goal to enable the developers to write Terraform to spin up new resources in the Cloud, or is it to enable the developers to spin up new resources in the Cloud. The challenge for a developer is not the learning curve associated with Terraform or Pulumi. If a developer can learn Rust, NodeJS, or Ruby, they can learn Terraform or Pulumi very well. IMO, Terraform or Pulumi is a lot easier to learn compared to other tech stacks, as they are more general-purpose compared to the IaaC. The challenge concerns what they need to understand about the Cloud to write the right IaaC. Do they understand the correct Subnets, Security Groups, Ingress, Egress, Parameter Groups, Secrets, etc.? How do they link them all? Finally, could they define the proper IAM permissions to allow all these integrations? This is where we need a balance. Do we train the developers to understand these nuances clearly so they can write their own TF/Pulumi scripts, or abstract it all - so they know they want an RDS and ECS. Still, they don't need to dig through the correct security practices to implement them; instead, they configure that they need an RDS and ECS/EKS to spin up their application. If the team wants to build that abstraction in Terraform using modules, use Backstage, or build extensible classes using Pulumi, then GO FOR IT - choose what works best with your resources and the time you can allocate. I recommend doing everything possible to keep the Cloud simple for the Developers. Yes, application developers can learn Cloud, but is that worth the investment for a business?