Apache Iceberg – An Architectural Look Under the Covers

   
  • Jason Hughes

Session Abstract

Data Lakes have been built with a desire to democratize data – to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, released by Facebook in 2009 that addresses some of these problems, but falls short at data, user, and application scale. So what is the answer? Apache Iceberg. Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.Join Jason Hughes, Technical Director at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.You will learn:

  • The issues that arise when using the Hive table format at scale, and why we need a new table format
  • How a straightforward, elegant change in table format structure has enormous positive effects
  • The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
  • The resulting benefits of this architectural design

Ready to Get Started? Here Are Some Resources to Help

Tutorial

AWS

Getting Started with Apache Iceberg Using AWS Glue and Dremio

Apache Iceberg tables not only address the challenges that existed with Hive tables but bring a new set of robust features and optimizations that greatly benefit data lakes. This tutorial explores how to create an Iceberg table in an AWS-based data lake using AWS Glue.

read more
Whitepaper

Whitepaper

Ten Top of Mind Challenges for Data Engineering

Data engineers play a crucial role in designing, operating, and supporting the increasingly complex environments that power modern data analytics. What are their most important challenges and how can they solve them strategically?

read more
Whitepaper

Whitepaper

A Definitive Guide to Apache Iceberg

Apache Iceberg is an open source table format for representing database tables in huge analytic datasets. Designed to overcome the limitations of Hive, it’s quickly becoming the standard for data lakes and lakehouses. Find out why so many organizations are embracing Apache Iceberg - and how you can benefit from it.

read more

Ready for an Amazing BI Experience?

Get Started
Gnarly Surfing