Tuning Row-Level Operations in Apache Iceberg
Companies leverage Apache Iceberg to build reliable and efficient data lakes with features that are normally present only in data warehouses. As users begin to use Apache Iceberg in a bigger range of data processing scenarios, it is essential to support efficient and transactional delete/update/merge operations even in read-mostly data lake environments.
This talk will be a deep dive into the copy-on-write and merge-on-read approaches for executing row-level operations in Apache Iceberg so that users can pick the correct implementation for a given use case. In addition, the presentation will help data engineers to avoid common mistakes and tune delete/update/merge operations at scale.
Anton is a committer and PMC member of Apache Iceberg as well as an Apache Spark contributor at Apple.