Hey guys Anyone has a good solution for gitlab ru...
# platform-toolbox
u
Hey guys Anyone has a good solution for gitlab runner auto scaling using docker machine over aws? The solution gitlab provides is one vm running one job. I want each vm to run multiple jobs and auto scale only if we really need more machines. K8s executors are currently slow to start (gitlab bug) and we don’t use k8s anyway so it’s not an option
h
Although I highly recommend setting up an EKS cluster and running gitlab there I can understand K8s is an investment to get going. I’ve setup what I think you’re after using these instructions before and it was pretty fine, even supports spot instances for runners. https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/ Take a good look at the different parameters you can configure under the MachineOptions block of your runner manager’s [runners.machine] config. Depending on the load pattern of your org/project(s) I also recommend “warming up” the autoscaling IdleCount before the jobs start. We experienced that the cluster became completely cold during the night and the first jobs of the day took a while to start while the runner manager started new machines.
u
The problem with this solution is that each machine can run only one job If the start up time of a machine was quick that would not have been a problem, but it takes ages to start a machine (minutes) And it is not a viable solution for jobs that takes seconds to complete Not would I want to spawn 100 vms to run 100 jobs that can run easily on few machines
What I’m looking for is machine auto scaling plus multiple concurrent jobs on each machine
h
We run our gitlab runners in our kubernetes clusters, we made it so it only runs on specific nodes, has been pretty bulletproof so far, same with the auto scalling
u
I don’t have a k8s cluster and setting it up just for gitlab runner is not an option
h
Hm…I was sure it was possible when I did this but looking at this now it seems you are spot on. Looks like your best bet is to use an instance size appropriate for your runner requirements and just lots of them…cost wise that is going to be the same thing (actually could be cheaper if you can optimise this very well). i.e. if you need say 512mb for a job t3.nano or t3.micro are going to fit the bill and you pay for exactly what you use. If you have varying requirements based on different jobs you should be able to use a couple different instances types with different runner tags etc to orchestrate this quite efficiently.
I’m missing your main point of spinup time…yeah this is a pain. Sorry I couldn’t be of more help. Best of luck to you.
u
Thanks for trying
e
I know you mentioned your not using k8s and I would agree to not pick it up just for this, but in case anybody else comes across this — our teams built a cluster autoscaler that allowed us to reduce some of the pains around node autoscaling due to large spikes. It is similar to KEDA, but preceded it a bit and used a bit differently. It was a fun project and the same autoscaler is used for some of our other critical infrastructure now. https://medium.com/@eric.irwin/custom-autoscaling-for-gitlab-kubernetes-executors-cfbb90ec6094
u
On the subject of gitlab runner using k8s executor (it’s for a different client of mine) They currently have a bug which causes the prepare environment step to take roughly 40 seconds So every job takes 40 seconds to start Anybody else has this problem? Knows how to solve it?
h
Only thing I changed for ours was use the ubuntu container image, the alpine one was causing issues all the time
u
We see the same issue with all images The default is Ubuntu 20.04 But most jobs override it