Hi, we're looking for a proper solution to do data...
# gitops
d
Hi, we're looking for a proper solution to do data fixes on databases using gitops principles. So it's not about db schema migrations, those can be handled by tools like atlasgo, schemahero or even liquibase or flyway. It's about (small) data fixes in our databases. E.g. updating some values in a specific (set of) table(s).The idea is to have some audit trail and history of these changes and have a PR approval flow. We're still figuring out how to handle the GDPR stuff, but currently that's not the most important (e.g when correcting spelling errors of a user's name in our "user" db). Any suggestions or ideas on this would be great. We're now thinking of creating a springboot/liquibase app and utilise liquibase changesets to handle this. The issue with this is that we actually need to build and deploy a new version of the app for every data fix. As far as I know, you cannot run liquibase as a service and have it "reconcile" a folder with your changesets.
a
you probably want to ask who’s making the changes and what the workflow you want is. What you’re outlining as part of the problem statement really doesn’t make sense for a GitOps flow. Most of the time I’ve seen that type of change made through admin tooling w/ an audit trail. Its probably cheaper over the long term to invest in ReTool for things that enter inside of the schema and correct data. This is because you’re going to want to have frontline support (cheaper on a per head basis) than engineering make these changes.
If this is about data migrations (converting json to proto in table or gzipping a blob) - I would say have an ETL workflow / job runner framework that can just run one off scripts against the databases.
I would start with the user that you want to interface w/ the change and the TCO you’re looking to optimize for, then introduce the technical requirements, THEN look at the technical solution (aka GitOps)
g
Why can't you also use your existing Schema migration framework to do the Data migration? As for the framework, they don't differentiate Schema vs Data migrations. You can have two init processes -- that run the Schema migration first and then the Data migration next. With this approach, you don't need to buy in to introduce a new framework into your application stack, and you can also get GitOps, where the migration scripts are managed in Git with another set of scripts. The only challenge I foresee with your Data migration scripts is that if it needs to fix something that has sensitive data. It's not great to store those production values in your Git. As long you don't do that, it should be OK. If the plan is to support any data fixes, then GitOps might not be the right solution for you since there will challenges in convincing your compliance auditors on why you have Production data codified in your Git.
d
Thanks all for your input. I guess the real issue is the audit trail of changes and approval flow.
m
Hi @Danny Kruitbosch the IDP we are building can handle this use case with Playbooks and a custom catalog (Different approvals based on the who owns the db/table) - we do something similar for enabling users to edit resources in Kubernetes, but have the changes push back into a Git rather than being applied directly
c
I am very much with Andrew on this one @Danny Kruitbosch - this is not a case for a GitOps approach at all but for admin tooling that is generating an audit-trail. If you absolutely must use developer tooling to handle this, then you need to separate this from your usual repo and can’t just re-use your existing migrations framework. The reason is that you cannot use the GitOps feature to re-deploy a commit anymore without the need to replay any additional data-change commits as well to reach a consistent state from a business perspective - not what you expect from a GitOps-enabled project or where you want to be with it.
m
I don't see why GitOps cannot be used ? From a GitOps perspective there is no need replay changes, only if that change has not been applied or somehow unapplied it should automatically reapply it.
a
that’s super hard to understand with dbs
c
Lets say you execute the following steps: 1. Deploy V12 2. Correct name of user “Flrian” to “Florian” 3. Correct name of user “H%bert” to “Hubert” 4. Correct another funky username 5. Deploy v13 Now something goes horribly wrong - you need to roll back to V12 to stabilize prod. No problemo! That’s one of the reasons you are doing GitOps for after all, right? 🟡 -> You now need to figure out what the actually right commit to roll back to is - it’s not V12 anymore but step no 4. 🔴 -> If you also had steps 6, 7 and 8 after V13, correcting some more names, then you would even need to start cherry picking and re-mixing to achieve the correct data state from business perspective. Something you really don’t want to do under pressure because your prod system is down. Also something you don’t want to explain to an auditor on the next iteration of that funny activity. Conclusion -> DON’T mix your sources with your data and deploy in GitOps style from the same repo.
Agreed @Moshe Immerman?
m
That is where db down migrations come in, if the migration shouldn't be there then it should have the necessary details to roll it back,
Gitops also does not require the ability to rollback, just that the steps taken to get there should be immutable
And a rollback that removes the history of the rollback itself is not considered a fully compliant GitOps system
a
my point is that this is probably the wrong tool bc the user doesn’t want a migration. Would you update the data from your webapp in the same way? User signup creates a migration to add a user? change password is a migration?
m
We actually do create a new a git commit when a user signs up on our platform to provision infrastructure - For a normal business app obviously bulk update workflows should be built into the app itself. There will always be edge cases with a cost benefit tradeoff - If your app already has an audit trail and approval workflows than it might make sense todo it in the app, however if doesn't, or your changes are diverse, or across multiple apps / tenants then GitOps / Migration still makes sense. Answering @Danny Kruitbosch original question - https://github.com/kubernetes/git-sync will keep a folder in sync with a git repo, and can call liquibase etc whenever it is updated
d
All thanks for all the insights. We have plenty of stuff to reason about. We'll review all your insights and go from there.