On Wednesday, February 9th, the Apache Iceberg community released 0.13.0 with lots of new features and improvements. Also released was a new website and docs design.
Let’s take a deeper look at many of the new features that come with Apache Iceberg 0.13.0. (Release Notes)
Catalog caching now supports cache expiration
Catalog caching is a technique that can help speed up table reads by allowing engines to not have to read the table's metadata on every read.
Sometimes when multiple readers and writers are using Apache Iceberg tables from different engines the result would be that the cache would not refresh so a manual refresh was required to see the new data. With the new cache expiration feature a time interval can be set for the cache to expire, forcing an automatic refresh to resolve this issue. This can be configured using the setting cache.expiration-interval-ms which will be ignored if cache-enabled is set to false. Read more on this feature here.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Hadoop catalog can be used with S3 and other file systems safely by using a lock manager
When using Iceberg on object stores like S3 and committing to tables from multiple engines concurrently, you previously couldn’t use the Hadoop catalog safely. This is because the check Iceberg relied on to ensure writing via two concurrent separate jobs or engines isn’t atomic, so two concurrent commits could result in data loss.
Iceberg 0.13.0 fixes this issue by leveraging a lock table in services like DynamoDB so all catalogs can use locks for safe concurrency. Read more on this feature here.
Catalog now supports registration of Iceberg table from a given metadata file location
From a HiveCatalog you could drop a table but there was no way to add an existing table to the catalog. With this update you can now add existing tables to your catalog by passing the location of the newest metadata file of the external table. This is especially helpful for interacting with Hive external tables in Spark. Read more on this feature here.
Deletes now supported for ORC Files
In Iceberg’s v2 format, delete files are used to track records that have been deleted. Previously, this wasn’t supported when the table’s underlying file format was ORC. Now, both position and equality deletes are supported for tables backed by ORC files. Read more here.
Vendor Integrations
Along with the core updates detailed previously several vendor integrations were added in version 0.13.0 including.
The table listing API call in the Hive catalog can now return non-Iceberg tables. [#3908]
Conclusion
Apache Iceberg in 2022 is adding the features data engineers want and need which is clear given the attention and momentum it has recently received in just a little over a month into the year.
This momentum is just getting going with multiple announcements already in 2022. Iceberg will be playing a large role in the Subsurface 2022 conference being held Live online March 2-3 featuring several talks on Apache Iceberg. Register for free conference here so you don’t miss any of the Iceberg sessions.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.