how do you all manage environment conflicts? My or...
# general
y
how do you all manage environment conflicts? My org has a dev, QAT, Staging, prod environment.. During crucial times, a single team will block QAT for hours together and won't allow others to progress.. I have heard and read about preview and ephemeral environments.. Has anybody used it? would love to hear from the community
i
I've lead the team implementing them 3 jobs ago for the reasons you mention. Since then we've had them in all companies I worked for. They are great, but depending on the size of your full set of services they do have some caveats you should consider before implementing them: • They are highly likely to increase the costs. Our pre-prod EC2 cost is double our production EC2 cost. (And it's not double of $100) • They need investment upfront, continuous maintenance, and evolution. This is not trivial. • They remove the pressure to create services that can be tested locally in isolation. What this means is that the feedback loop for engineers will be longer, testing locally will almost always be faster than testing in a cloud environment that has been spawned for you. Our ephemeral environments too up to 1:30h to build until we did some radical changes and more are needed.
y
oh that's the tradeoff right? how did you build these ephemeral envs? We have abotu 180+ microservices.. When it comes to testing a particular feature, I'm off the opinion that we will need 4/5 general services and the others one are the ones that are under testing..
i
We build them via very complicated jenkins pipelines to create K8s namespaces with everything under them (migrating to Argo Workflows as we speak), and users request them via GitHub PR comments or a Slack bot. We don't have so many microservices, and we have problems of interdependencies. How independent are your microservices? Can they all be be built and deployed independently? We have dependency chains between them and it makes things very complicated and slow.
If you can make an ephemral environment with only 4/5 services plus the one that's being tested you may be ok, but you may also be ok with some local setup. I have a feeling ephemeral environments with 180+ services will be very difficult to manage.
We have around 60 microservices + 2 monoliths. It's not easy.
y
I see.. Yes, some of our microservices are dependent as well.. as in, they would need minimum 2 services to interact with and move forward. maybe, then we can cautiously decide when to spin up these envs right? as in, not for every PR.. so far, we have been planning to use docker compose to help devs test these locally.. we have even tried spinning up a kube with these subset deployed..
however, QA engineers have tough times just with one dedicated env for them
i
Maybe you need both the local environment and ephemeral full environments. And start with the one that will solve the biggest problem you have now.
Unless you can do both at the same time 😄
y
local is kinda sorted at some level
i
Then I would definitely looks at ephemeral environments, and I'm curious to know how it goes for you. I've been wondering for a few what's the upper boundary for ephemeral environments in terms of number of services/size.
w
How come a single team can block an entire pipeline for an environment that isn’t production? That sounds like the bigger issue at hand I don’t have any insight into ephemeral envs, other than the fact that we are doing this with ArgoCD going forwards and it seems promising.
y
Pipelines are not blocked but they have send notes manually to everyone asking them not to deploy anything new in QAT for 4 hours or so.. There is no automated deployment today.. Services and workers are based off elastic beanstalk, so people do manual deployment.. that's another problem we have addressed with newer teams
h
look at https://www.qovery.com/ for help in building ephemeral envs 😄
They also have https://github.com/Qovery/Replibyte that helps with the db side for those environments
p
https://link.medium.com/D5ujspTmjxb has some thoughts of my own on implementing ephemeral envs. We haven't completely solved how to only run a subset of services but we're working on it.
y
thanks all for your inputs