here is a question ive been struggling with for a ...
# platform-toolbox
j
here is a question ive been struggling with for a while now. i am heading up the build-out of our own IDP. one of our first areas of focus is for our managed services and cloud teams to be able to consume validated and well-architected cloud modules (specifically AWS modules written in terraform).... currently we are using gitlab as our platform (trying to minimize the amount of work needed for an mvp) and it is really powerful as it has its own built-in terraform registry. but i was wondering if anyone had any other alternatives (not terraform cloud as we have already been using it and it is not fit-for-purpose lol)
m
For storing terraform state, we have tried S3 with dynamoDB for state management. This works fine. Although not we migrated to terraform cloud for few projects.
j
interesting.... and did you create a library of re-usable modules for teams to consume? would be interested in how you managed that
đź‘€ 1
m
We have reusable modules but mostly they are handled by infra team only. We have an Atlantis pipeline setup, so any code change goes to git once approved it is applied automatically.
Like for dev and prod, we use same module but with different values. Creating kubernetes has its own module so everytime we create kubernetes we reuse same module. Dev teams mostly request us if they need something. We are mainly on kubernetes, so devs can do whatever they want on dev kubernetes. There are less request to do changes on aws as such. If some request comes for EC2, i ask them 2-3 questions on why EC2 instance(just to know why they need EC2 only) and team realize that things can be easily deployed on kubernetes
s
My team manages certain things such as EKS clusters as we don’t want teams spinning up their own of these. Our scale (medium size) allows for that though. My previous teams at another employer had node groups per-team but not clusters per-team and we managed those node groups as the platform team. For other things where you can safely parameterize them humanitec works very well for that. For yet other concepts, I’ve used the idea of a “starter kit” per terraform module, where it’s a versioned, artifacted terraform module that we used semver to maintain upgrade paths. We had a custom in-house go-based CLI that would pull down latest from the tf module and set it up based on their team profile (including state to a unique place).
The only tool I’ve found that comes close to this for internal TF modules is something like nullstone
m
in-house go-based CLI that would pull down latest from the tf module and set it up based on their team profile
Thanks for sharing @Schuyler Bishop. Curious to know more about your CLI tool? Does it execute some terraform code based on team provided?
s
It just setup a local git repo with the current version of the terraform module + a main.tf with the team’s S3 bucket for state but didn’t go any further than that.
Oh and it had a .github directory with the pipeline to run tf too.
m
got it ... interesting. We have given everyone a namespace in single dev EKS cluster. We keep an eye on resources and if number of instances increases, we check who is deploying what to notify them. They have full access to their namespace. Managing one EKS cluster per team is painful.
s
Agreed - we used node groups per team and used taints and tolerations to make sure their apps got deployed to their node group. Made it easier to compartmentalize a given team. We used one namespace per pull request.
One namespace per pull request works VERY much into the humanitec wheelhouse, btw.
d
Vcluster (https://www.vcluster.com/) by Loft Labs could be a potentially useful tool in having virtual Kubernetes clusters separated by namespaces if you find yourself in a situation where you need to develop “cluster-wide” type resources such that a single namespace doesn’t quite do the trick.
j
hey guys. these are all really cool options (really intrigued by the cli-driven approach @Schuyler Bishop).... kinda concerned about Humanitec pricing though 🤣 (if you have to ask the price, its too expensive right?)
the one question that i do have is how these tools could scale out from the "simple" use-case of a k8s cluster/namespace/workload etc.
we have a large cloud-consulting arm (as well as a managed services branch) and we find that spinning up infrastructure is a really big pain-point (infrastructure such as elasticache clusters, rds, kinesis, sagemaker, etc, etc) and i am trying to build a system that streamlines that through vetted tf modules etc. any suggestions?
c
I have in the past preferred the simple approach of a git repo per tf module, and one additional repo containing docs for all of the tf modules and how to use them. We (Platform / DevOps team) create and manage the terraform modules based on contextual information like compliance, security, best practice etc. When a developer on Day 1 wants to create an environment for themselves, the documentation has some guides. There are guides for most common things like k8s, EC2, private databases, public / private lambdas, apigateways as routers or as proxies etc. If a team needs something additional for their environment, we would work hands-on with them to build out a working model of what they need. If another team requests the same thing, we abstract that into a common place. We have found this most valuable when thinking about VPCs, subnets, routing etc. as developers don’t want to think about that necessarily, they just want a database / function / whatever else. Teams can then run their own platform, using vetted terraform modules, which are standard and a platform team who understands those modules to help debug. I’ve done the same for Kubernetes appliances like Ingress, Prometheus, and a bunch of Operators, where the installation is distributed as a Terraform module, and you can build your own platform the way you need it, as a developer. There are pieces which are not easily distributed through Terraform and need active management. Say Redis Clusters for example. We will then build / use an Operator, and install that through Terraform modules, which we can then version and iterate on
đź‘€ 1
j
so thats pretty much the approach i am taking @Chris Vermeulen 🙂 it really is great to see that the model works in the wild haha!
have you found that having people consume the modules from git directly to be a problem? we have found that in the past (especially when we are on sight and the client doesnt let us access our own git repos). hence my thought about a registry (eg: gitlab) that allows the teams to consume the module that way
c
The Kubestack catalog is a nice example. https://www.kubestack.com/catalog I’ve got a talk at PlatformCon called Composable platforms which is exactly about this approach. I’d love to iterate on the talk and improve it with you insights 🙂
j
looks awesome. gonna have a look now (ie: when i have 2 seconds in the work day free haha)
c
mmm. Strange that they would not let you access your own git repos, but would allow accessing public registries.
*Just a thought
j
welcome to fintech haha. the other issue is credentials though. sometimes we are forced to use other laptops and then access to ssh keys becomes an issue. also... surprisingly we have found that the git source for modules can be frustrating for devs at times.
c
I think if you have a big enough set of modules, running a registry for them should be pretty easy. There are a few options. I’ve primarily used the git sources, as they can be updated to be internal as well if folks are running some sort of enterprise git. So I would in that case clone the terraform stuff I have into their environment and use that. Hypothetically, if they don’t allow you to call your own git repos, then your own registry could also be a problem. If you go that route, might be worth publishing one of the non-critical modules to a registry, and seeing whether you can get it passed.
Docs are crazy important when publishing by git source. It can be extremely confusing for developers if they just want an ingress, to have to understand how to find the git source for your module somewhere.
j
yeah hundred percent hey. so the docs part of the discussion we are taking care of with pre-commit and terraform-docs (i fail any build that doesnt have an examples directory lol). but that developer experience of getting updates and knowing what has changed and how best to use it is the critical component
c
Updating developers and getting them to upgrade is an issue when releasing new versions. My experience so far has been a Slack / Teams spray and pray, and in a couple weeks start chasing people manually, especially for security related upgrades
m
Regarding terraform for kubernetes, i didn't had good experience using them. We prefer using helm directly with ArgoCD, while using terraform for aws resources with components divided in modules.
Do you guys prefer using terraform with kubernetes components like prometheus?
c
If installing the prometheus operator I prefer the Terraform approach. It allows me to create smaller environments for testing with a repeatable terraform file, and the singular tool approach makes it really easy to adopt. The terraform also allows to use whatever templating engine behind it. Helm, Kustomize, Standard yaml. It’s all abstracted behind Terraform, so there is one installation path With that being said, I haven’t used Helm much in the last couple months, besides using it to template the origin manifests behind some of the Terraform modules. I’ll use Helm to pull the latest release from the Prometheus Operator for example, template it, and then expose it behind the Terraform module. I’ve seen some folks automate this approach also, by running a script every night to fetch the latest release for the prometheus operator, and automatically creating a PR for the latest changes back to the terraform module which they can then validate and check in the morning, or even write some e2e tests for how they use it and validate it via those before releasing internally
đź‘Ť 2