Dagster: An Orchestrator for the Full Data Lifecycle
Building datasets in a data lake boils down to developing, executing and monitoring graphs of computation. Traditional orchestrators focus on sequencing computations in production, but the graph is equally important when developing and testing changes, as well as when monitoring and debugging production datasets. Dagster is an open source orchestrator built for the full data lifecycle.In this talk, we’ll discuss how the orchestration graph helps answer some of the most important questions in data engineering. We’ll discuss how developing and testing with orchestration graphs enables early error detection on issues that are usually caught later in production. And we’ll talk about how the orchestration graph allows you to capture lineage in order to track all the runs and datasets that are upstream of a particular dataset.
Sandy Ryza is a Software Engineer at Elementl. He previously led the freight ML engineering team at KeepTruckin. He authored O’Reilly’s Advanced Analytics with Spark and is a committer on Apache Spark and Apache Hadoop.