Companies leverage Apache Iceberg to build reliable and efficient data lakes with features that are normally present only in data warehouses. As users begin to use Apache Iceberg in a bigger range of data processing scenarios, it is essential to support efficient and transactional delete/update/merge operations even in read-mostly data lake environments.
This talk will be a deep dive into the copy-on-write and merge-on-read approaches for executing row-level operations in Apache Iceberg so that users can pick the correct implementation for a given use case. In addition, the presentation will help data engineers to avoid common mistakes and tune delete/update/merge operations at scale.
Anton Okolnychyi is a Spark contributor and a Software Engineer at Apple. He has been dealing with the internals of Spark for the last three years. At Apple, Anton works on the elastic, on-demand, secure, and fully managed Spark as a service. Prior to joining Apple, he optimized and extended a proprietary Spark distribution at SAP. Anton holds a master’s degree in computer science from RWTH Aachen University.