Slackbot
10/07/2023, 1:58 PMJamie Tanna
10/07/2023, 1:58 PMpush
to the default branch of that repo)
- trigger an event
- event invokes a short-lived worker, in a container, which:
- takes a few configuration items from the event
- downloads the code for that repo
- runs a process against that codebase
- write the result to a database
Has anyone done similar before?
Not tied to Cloud Providers or anything yet 😄Javier GarcÃa Manzano
10/07/2023, 2:00 PMJavier GarcÃa Manzano
10/07/2023, 2:01 PMJamie Tanna
10/07/2023, 2:02 PMJavier GarcÃa Manzano
10/07/2023, 2:03 PMJavier GarcÃa Manzano
10/07/2023, 2:04 PMJavier GarcÃa Manzano
10/07/2023, 2:05 PMJavier GarcÃa Manzano
10/07/2023, 2:07 PMJamie Tanna
10/07/2023, 3:08 PMpush
events sent to the app, and then processed similarly to mentioned above.
What’s the expected traffic?Unfortunately don't have any numbers to hand, but I seem to think it was roughtly 30-60 requests a second sounds about right.
What’s the desired reliability/is it ok for it to be down short amounts of time?I'd prefer for the webhook layer to be more resilient, and at least allow processing the events, but if the workers are down for a little while, that'd be OK
Who’s going to maintain it?One person team, me
What tech are you familiar with and do you have a reason why you would want to deviate from that?Go, AWS, some Typescript. Typescript required for the worker process, would prefer to use Go if possible, but not necessarily tied to it.
What’s the desired build/ship timeline?This is a side project, so pretty flexible with timelines.
From the top of my head, you could do this very easily on top of AWS with lambdas and SQS.Yeah I was thinking this too, and it tracks with a previous implementation. From a previous version I've built on this, I'm not sure I'll be able to use Lambda, as I was hitting the 15 minute invocation limit...
Another consideration – do you want the worker to be downloading a repository every time? That’s a somewhat expensive operation which can be cached by making the workers not short-lived.... because we were downloading very large Git repos, with their full history. The idea is that we'd either something like a
git clone --depth 1
, or download a ZIP of the code, which would remove a lot of the lookup needed.
Short-lived is also because we were hitting disk space issues after multiple runs execute.
And one last thing that comes to my mind, can there be concurrency issues? Is it ok if two workers do an op on the repository simultaneously or should operations always be linear? You’ll have to deal with conflicts if not.The idea is that only the default branch of a repo is going to be processed, so if it's a repo with lots of regular pushes to the default branch - that may be pushed quicker than the scan can execute - we may lead to some overwriting. It'd be prefered if that didn't happen, but not so bad if the results aren't quite right However, it'd probably be good to allow pruning unnecessary worker processing if possible, just to save work.
Dominik Zyla
10/07/2023, 11:46 PMDom Hutton
10/08/2023, 9:03 PMJamie Tanna
10/10/2023, 9:49 AMDom Hutton
10/10/2023, 11:35 AMJamie Tanna
10/10/2023, 11:38 AM