Distributed Transactions on the Data Lake with Project Nessie

Wednesday, July 21 2021

While database concepts like transactions, commits and rollbacks are necessary for traditional data warehousing workloads, they’re not sufficient for modern data platforms and data-driven companies. Project Nessie is a new open source metastore that builds on table formats such as Apache Iceberg and Delta Lake to deliver multi-table, multi-engine transactions. In this talk we will discuss the transactional model of Nessie and how it can help improve the ETL workflow. We will introduce the recently released Nessie Airflow provider and its use in multi-stage and complex workflows as an example of the power of Nessie transactions. We will finish with a demo and a discussion on the production readiness of Project Nessie.