Hiveberg: Integrating Apache Iceberg with the Hive Metastore

Apache Iceberg is an open table format that can be used for huge (petabyte scale) datasets. This talk will give an overview of Iceberg and its many attractive features such as time travel, improved performance, snapshot isolation, schema evolution and partition spec evolution. We’ll then discuss how Iceberg can be used inside an organisation such as Expedia Group to power next-generation data lake technology. One of the challenges of moving to a new table format for an organisation that already has a significant investment in existing technologies (in our case Hive and, specifically, the Hive metastore) is to prevent data silos from forming, where data generated in the new format can’t be used by others who haven’t switched to it yet. We’ll discuss the solution we came up with, Hiveberg, which opens up a path to read Iceberg tables from Hive (and thus any tooling that supports Hive). This allows more advanced users to take advantage of the features of Iceberg when creating data but still allows this data to be widely read and used by others.

Topics Covered

Apache Iceberg
Hive Metastore
Table Formats

Ready to Get Started? Here Are Some Resources to Help

Using Data Mesh to Advance Distributed Data Access, Agility and Governance

Join this live fireside chat to learn about using Data Mesh to Advance Distributed Data Access, Agility and Governance.

read more


Smart Data – Smart Factory with Octotronic and Dremio

read more


What Is a Data Lakehouse?

The data lakehouse is a new architecture that combines the best parts of data lakes and data warehouses. Learn more about the data lakehouse and its key advantages.

read more

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us