
Data Lineage with Apache Airflow
With Airflow now ubiquitous for DAG orchestration, organizations increasingly depend on Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed? In this talk we introduce Marquez, an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.
Topics Covered
Speakers

Willy Lulciuc
Willy Lulciuc is a Software Engineer at Datakin. He makes datasets discoverable and meaningful with metadata. Previously, he worked on the Project Marquez team at WeWork. When he’s not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.