Has anybody tried Apache AirFlow as a platform bac...
# platform-toolbox
g
Has anybody tried Apache AirFlow as a platform backend?
a
@Matt Menzenski is this you?
m
I don’t really have any Airflow experience, but depending on what the question’s about, I may or may not be doing this with Dagster
a
ah shoot. Sorry. Thought you had done airflow as well!
m
my boss has done it a lot, I have so far tried to stay away from it 😄
g
hahaha, thank you, I've been looking through some AirFlow and Flyte info, but for now what looks interesting of AirFlow is that there's a Pulumi provider for IaC automation. But I'll take a look into Dagster.
Any input about why would be better to use Dagster instead of AirFlow?
m
I’m coming to Dagster from Argo Workflows (and before that, from AWS Glue). Dagster has a really nice local developer experience (the
dagster dev
command), has first-class support for testing, and has been just really easy to setup overall - I’m using the Dagster-maintained Helm charts with kustomize overlays, and deploying via ArgoCD. Dagster’s OSS deployment is all vanilla Kubernetes resources - no CRDs. So it’s simple to get going for us.
I don’t have enough familiarity with Airflow to make a meaningful comparison.
g
Awesome! Thank you Abby and Matt for your input!
a
@Louis Dussarps
l
We seriously considered it for our orchestration backend (but then, as we didn't select it, it's not a production feedback). The limitations we found blocking for our needs: • Error handling: In Airflow, it's difficult to catch errors and continue workflow execution on a different path. We needed this for cases like: if one of these tasks fails, then rollback some previous steps and exit with an error. • Acyclic graph structure: Related to the first point, Airflow enforces an acyclic workflow structure, whereas we wanted to introduce retro-control loops—for example, for drift detection, health checks, or license renewals. • More of a convenience issue, but still relevant: Since we have multiple ECS or Kubernetes clusters, we wanted an easy way to run tasks inside them (e.g., to create databases). While this is clearly possible with Airflow, we judged we would need to extend most of the operators to reuse some common patterns efficiently. In the end, there are probably workarounds for each limitation, but for our use case, we felt we would be bending the tool too far from its original purpose
a
Out of curiosity, what did you end up with @Louis Dussarps?
l
Hmm, we ended up building our own framework ^^' We also explored Temporal in depth, but we found that a lot of the work would involve creating cloud abstractions that don’t exist yet. So… we ended up with yet-another-framework—haha! Jokes aside, we also wanted to write graphs with feedback loops, whereas most of the tools only support workflows. However, we think a community-driven effort in this space would be valuable since orchestration requires a lot of integrations with different tools. As a consequence, if you're curious, we plan to open-source it soon ! Right now, it’s still quite tailored to our company’s needs.
a
Makes a lot of sense! I am working on Kratix.io which is definitely in the space but not the only one doing that stuff 😄
l
Didn't know about Kratix ; I will definitely keep an eye on it !
a
And please do share yours once it is safe to do so! I love seeing the different solutions as they almost always just prioritise different things and therefore work for different people!
c
@Louis Dussarps you can check the platform tooling landscape - orchestrators should be the thing and Abby is right in there with Kratix. If you know of something missing in the bucket - it’s a community effort 🙂
g
Hey @Louis Dussarps and @Clemens Jütte, thank you for stepping in 🙂 . In fact, we are planning to create a PoC with Kratix and Apache Airflow, as we have been dealing with other priorities in the organization, we'll be ready maybe by the end of March. I appreciate your inputs here!! 👍 👍 👍
a
All the best on the PoC @GIOVANNY VELEZ! Feel free to reach out directly any time with questions or join the community slack for more support. We run PoC’s with customers all the time and are happy to share our learnings on how to get the most from your effort if that helps too 😄