Hey all! Curious what people are using to run aut...
# general
c
Hey all! Curious what people are using to run automation Specifically, how are you enabling developers to run self serve actions in your platforms • Enabling restricted access to databases • Provisioning new resources • Getting diagnostics about their services, stack traces, heap dumps and the like I’ve seen a few automation solutions like rundeck and stack-storm, would love to hear what you’re all up to
m
When you say "Enabling restricted access to databases" do you mean developers getting production db access? Or to a development environment with the same schema, but no customer data?
c
Mainly development environments I would say
But also curious if people offer ways to do breakglass operations in production
m
For dev environments (and local instances) I'm of the mindset that developers should have full access by design. If you are doing things the right way, there should be no customer data, and guardrails in place to protect them from themselves. As for Production, I have seen a bunch of different approaches for this. JIT provisioning to a very specific scoped read-only node/user with a full audit trail, managerial approval, and a self-destruct timer to revoke access works. Another option we use in certain app stacks is a bot that developers can interact with via chatops with their queries when trying to get a production issue resolved. It has built-in guardrails, as well as a list of approvers who can say yes/no to a certain request (and a second validation for PII on the resultset before it is sent). Pretty sure there are tools out there that do both of those things if you dont want to build and maintain your own.
c
Gotcha, so I guess something like terraform to produce a tightly scoped bastion machine
Thanks @Mobs! appreciate it
a
For debugging in k8s I used https://github.com/gocardless/theatre to give safe (and approved) exec containers which could also allow some breakglass database actions. But that is a really narrow use case. I also had RunDeck as well, but that was more an ops tool than for the app devs.
c
Ah interesting, what was your use-case for runbook if you dont mind me asking?
a
Separately, I worked with a customer to create a solution around enabling users to run ansible scripts in a super regulated environment using Kratix. Have to admit, not the use case we are building Kratix for, but really interesting to be able to help em out!
For us, RunDeck allowed us to have consistency and auditability of runs on servers.
Also, and maybe not the initial reason to get RunDeck, but it allowed us to manage most incidents from our phones rather than needing to log into laptops out of hours.
c
Ah gotcha, was this largely maintenance commands on bare metal servers / vms?
a
Yea, they were cloudVMs but yes, mostly maintenance oriented (rotate things, restart things, roll out changes to things, scale things, etc)
c
Ah nice, thanks! And with the incident management was that performing actions to remediate like bounce this server?
a
Exactly. They were the scripts we had as a team within our runbooks. Often like “grab the load balancer name and run this script via RunDeck with that loadbalancer”.
We did do some scripting of RunDeck via API calls, but that wasn’t really the use case for us, it was more there for the one off needs
c
Ah super cool, did you ever get to the point of automatically running rundeck runbooks in response to an alert or something similar?
a
We did not, but it was for sure possible. It was just not super common occurrence and so we saw value in having the human in the loop
c
Thanks! Also curious if there was anything you wish was different about RunDeck / gotchas
a
Sorry a few years out now so I think it would all be a bit stale tbh!
c
No worries! Thanks Abby!
c
Hey Chris! You can take a look at the tooling landscape https://platformengineering.org/platform-tooling . There are a few categories that come to mind but most probably you’re looking for a platform orchestrator. Abby already mentioned Kratix, but there are others as well 🙂
c
Thanks @Clemens Jütte I’ll take a look now!
t
for access to databases and hosts, check out https://www.boundaryproject.io/
a
Automation: Checkout things Argo Workflows, Tekton, Crossplane, etc. You can run deploy and reuse any automation within a k8s cluster. The cool part is that these tools helps in making reusable "recipes" which can be controlled by k8s RBAC. Authorization: Vault is great for secrets storage. Tools like Boundary and Ory Keto let you get really fancy with permissions. Keycloak is a good option if you need to manage user accounts and logins too. Even Kubernetes itself has some basic permission controls built-in. Diagnostic: eBPF is the shiny thing right now with the auto instrumentation feature and Cilium is a rising star. Grafana and Calico are great options but might requires more work. There are options like Groundcover that can help but the UI interface lives in their cloud and therefore comes with a subscription. Instrumentation could also be achieve using OpenTelemetry and adding the support within your application. This is best if you need to trace specific things that requires more granularity. This can then be shipped to grafana Tempo, jagger, zipkin, in a standard format. Good luck with your quest!