Hey all Curious what people are using to run automation Spec Platform Engineering #general

Hey all! Curious what people are using to run aut...

Chris Battarbee

04/11/2024, 1:43 PM

Hey all! Curious what people are using to run automation Specifically, how are you enabling developers to run self serve actions in your platforms • Enabling restricted access to databases • Provisioning new resources • Getting diagnostics about their services, stack traces, heap dumps and the like I’ve seen a few automation solutions like rundeck and stack-storm, would love to hear what you’re all up to

Mobs

04/11/2024, 1:53 PM

When you say "Enabling restricted access to databases" do you mean developers getting production db access? Or to a development environment with the same schema, but no customer data?

Chris Battarbee

04/11/2024, 1:53 PM

Mainly development environments I would say

Chris Battarbee

04/11/2024, 1:54 PM

But also curious if people offer ways to do breakglass operations in production

Mobs

04/11/2024, 2:04 PM

For dev environments (and local instances) I'm of the mindset that developers should have full access by design. If you are doing things the right way, there should be no customer data, and guardrails in place to protect them from themselves. As for Production, I have seen a bunch of different approaches for this. JIT provisioning to a very specific scoped read-only node/user with a full audit trail, managerial approval, and a self-destruct timer to revoke access works. Another option we use in certain app stacks is a bot that developers can interact with via chatops with their queries when trying to get a production issue resolved. It has built-in guardrails, as well as a list of approvers who can say yes/no to a certain request (and a second validation for PII on the resultset before it is sent). Pretty sure there are tools out there that do both of those things if you dont want to build and maintain your own.

Chris Battarbee

04/11/2024, 2:21 PM

Gotcha, so I guess something like terraform to produce a tightly scoped bastion machine

Chris Battarbee

04/11/2024, 2:21 PM

Thanks @Mobs! appreciate it

Abby Bangser

04/11/2024, 2:35 PM

For debugging in k8s I used https://github.com/gocardless/theatre to give safe (and approved) exec containers which could also allow some breakglass database actions. But that is a really narrow use case. I also had RunDeck as well, but that was more an ops tool than for the app devs.

Chris Battarbee

04/11/2024, 2:36 PM

Ah interesting, what was your use-case for runbook if you dont mind me asking?

Abby Bangser

04/11/2024, 2:36 PM

Separately, I worked with a customer to create a solution around enabling users to run ansible scripts in a super regulated environment using Kratix. Have to admit, not the use case we are building Kratix for, but really interesting to be able to help em out!

Abby Bangser

04/11/2024, 2:37 PM

For us, RunDeck allowed us to have consistency and auditability of runs on servers.

Abby Bangser

04/11/2024, 2:37 PM

Also, and maybe not the initial reason to get RunDeck, but it allowed us to manage most incidents from our phones rather than needing to log into laptops out of hours.

Chris Battarbee

04/11/2024, 2:38 PM

Ah gotcha, was this largely maintenance commands on bare metal servers / vms?

Abby Bangser

04/11/2024, 2:39 PM

Yea, they were cloudVMs but yes, mostly maintenance oriented (rotate things, restart things, roll out changes to things, scale things, etc)

Chris Battarbee

04/11/2024, 2:40 PM

Ah nice, thanks! And with the incident management was that performing actions to remediate like bounce this server?

Abby Bangser

04/11/2024, 2:41 PM

Exactly. They were the scripts we had as a team within our runbooks. Often like “grab the load balancer name and run this script via RunDeck with that loadbalancer”.

Abby Bangser

04/11/2024, 2:42 PM

We did do some scripting of RunDeck via API calls, but that wasn’t really the use case for us, it was more there for the one off needs

Chris Battarbee

04/11/2024, 2:42 PM

Ah super cool, did you ever get to the point of automatically running rundeck runbooks in response to an alert or something similar?

Abby Bangser

04/11/2024, 2:56 PM

We did not, but it was for sure possible. It was just not super common occurrence and so we saw value in having the human in the loop

Chris Battarbee

04/11/2024, 2:57 PM

Thanks! Also curious if there was anything you wish was different about RunDeck / gotchas

Abby Bangser

04/11/2024, 2:58 PM

Sorry a few years out now so I think it would all be a bit stale tbh!

Chris Battarbee

04/11/2024, 2:59 PM

No worries! Thanks Abby!

Clemens Jütte

04/12/2024, 9:09 AM

Hey Chris! You can take a look at the tooling landscape https://platformengineering.org/platform-tooling . There are a few categories that come to mind but most probably you’re looking for a platform orchestrator. Abby already mentioned Kratix, but there are others as well 🙂

Chris Battarbee

04/12/2024, 9:49 AM

Thanks @Clemens Jütte I’ll take a look now!

Thomas Harris

04/12/2024, 10:07 AM

for access to databases and hosts, check out https://www.boundaryproject.io/

Alexandre Proulx

04/12/2024, 3:39 PM

Automation: Checkout things Argo Workflows, Tekton, Crossplane, etc. You can run deploy and reuse any automation within a k8s cluster. The cool part is that these tools helps in making reusable "recipes" which can be controlled by k8s RBAC. Authorization: Vault is great for secrets storage. Tools like Boundary and Ory Keto let you get really fancy with permissions. Keycloak is a good option if you need to manage user accounts and logins too. Even Kubernetes itself has some basic permission controls built-in. Diagnostic: eBPF is the shiny thing right now with the auto instrumentation feature and Cilium is a rising star. Grafana and Calico are great options but might requires more work. There are options like Groundcover that can help but the UI interface lives in their cloud and therefore comes with a subscription. Instrumentation could also be achieve using OpenTelemetry and adding the support within your application. This is best if you need to trace specific things that requires more granularity. This can then be shipped to grafana Tempo, jagger, zipkin, in a standard format. Good luck with your quest!

2 Views

Open in Slack

Previous Next