https://platformengineering.org logo
Title
n

Nadav Cohen

04/21/2023, 7:54 AM
Migration permutations and deprecations: when migrating from some legacy product (eg VM deployments) to some newer product (eg k8s) there is an explicit fork of the platform's stack, which we notice forces us to support lots of edge cases that only exist in the migration limbo (eg discovering and access controlling services across VMs and k8s), and whihc in turn create barriers for migration. How do folks here choose which limbo edge cases to support? Or, how do you plan your migrations such that you minimize these limbo barriers or avoid them entirely? (Eg don't do MVPs and iterate and just build a complete replacement of the existing solution?) Lastly, what do you do when you are doing several of these stack migrations at once? Or do you explicitly limit yourselves to one platform fork at a time?
o

Oakley Hall

04/21/2023, 10:39 AM
We have tried to make our migration related changes such that they run on both platforms, behind a flag if needed. The idea is that the same commit is running in production on both the old stack and the new stack, as we shift traffic incrementally. This allows feature teams to continue making changes and delivering features to both platforms during the migration. Building and supporting these kind of abstractions certainly slows things down for us, but it keeps the feature teams delivering at about the same rate they were before we started the migration. We try to take on one migration at a time, because you really never know what you're getting into until you really get in the weeds.
n

Nadav Cohen

04/21/2023, 1:19 PM
That's a good practice. It seems to imply that you also have an explicit abstraction layer that hides all these things you're switching? How would you go about moving an entire stack? Eg from ELK to PromGrafMimir? Specifically what would you do about all the integrations/automations your feature team engineers already rely on with their current ELK stack as you try to move them to the prom stack?
o

Oakley Hall

04/21/2023, 3:22 PM
I don't know what those are. We try to run parallel CICD pipelines triggered by the same actions (code merge etc). As the new stack becomes more stable we switch over more things like lower environments CNAMES and testing tools. For a while 2 deployments have to be validated and two releases have to be done in parallel, this is the risky, stressful part, so we try to make it as short as possible. We don't really create a overarching abstraction layer, rather we look at dependencies on the infrastructure one case at a time and try to build an abstraction or flag around that. Post migration we'll review those flags and abstractions for possible removal
r

ranjit

04/21/2023, 3:44 PM
are you talking about vm deployments on the data centers or cloud? either ways incremental migration or gradual migration of each migration stack while testing all scenarios is an option. As Oakley said, for sometime there will be 2 parallel deployments and validations and we switch off the legacy once the new environment is setup considering HADR, autoscaling, networking (egress, ingress) and enabling multiple endpoints across geo locations and tested end to end