h2h2h2h2h2h2h2h2h2h2h2h2h2h2h2h2h2

36 minute read · September 12, 2022

Apache Iceberg 101 – Your Guide to Learning Apache Iceberg Concepts and Practices

Alex Merced

Alex Merced · Senior Tech Evangelist, Dremio

Apache Iceberg is an open-source data lakehouse table format that has taken the big data analytics world by storm. 

In this article, you’ll find a 101 video course along with an aggregation of all the resources you’ll need to get up to speed on Apache Iceberg in concept and practice.

What's a Data Lakehouse?

The Apache Iceberg 101 Course

Below are videos to educate you about Apache Iceberg and how to use Iceberg tables to enhance your data experience. After the course, you’ll find an index of resources from around the web to continue expanding your Iceberg knowledge.

NOTE: I'll be quarterly recording a new "Overview" video that'll summarize a lot of the data from below along with any new features since the previous video

  1. Introduction to the course
  2. The Problem and the Solution (Iceberg’s Origin Story)
  3. Iceberg and the Data Lakehouse
  4. Overview of Apache Iceberg’s Architecture
  5. Iceberg Transactions Step by Step
  6. Iceberg Catalogs
  7. Copy-on-write and Merge-on-read
  8. Table Tuning with Table Properties
  9. Migrating to Iceberg
  10. Time-Travel
  11. Maintaining Iceberg Tables
  12. Hard-Deletions and GDPR

Tutorial: GET HANDS ON WITH ICEBERG ON YOUR LAPTOP

Tutorial: Apache Iceberg Lakehouse Engineering

Directory of Additional Iceberg Resources

After watching the series of videos above, you should have a pretty good understanding of Apache Iceberg and its concepts. 

Below is a list of additional resources to continue learning more about Apache Iceberg, including hands-on exercises, articles from companies detailing their usage of Apache Iceberg and more.

Apache Iceberg Core Concepts

Below are several resources for understanding what Apache Iceberg is and how it fundamentally works at a high-level conceptual level.

Apache Iceberg Features

Below are resources to learn more about the many features of Apache Iceberg.

Hands-on Apache Iceberg Exercises

The resources below guide you through guided exercises and tutorials to try Apache Iceberg in action with different tools.

Apache Iceberg and BI Dashboards

Iceberg Video Demos

Videos showing hands use of Apache Iceberg Tables

Comparison of Apache Iceberg to Other Table Formats

With the resources below you can read on how Apache Iceberg compares to other table formats.

Companies Sharing Their Production Apache Iceberg Usage

Below are articles from companies that have documented their deployment of Apache Iceberg into production. You can read about their experiences and lessons learned.

Optimizing and Maintaining Apache Iceberg Tables

Once you have Apache Iceberg tables in place you’ll want to optimize and maintain them, below are articles that walk through different features for engineering tables for best performance.

Ingesting Data into Apache Iceberg Tables

How do we get data into our Iceberg tables, the following are articles on the ingestion of data into Iceberg tables from different sources.

Working with Cloud Object Storage

Object storage has become the standard for storing data in a data lakehouse and the resources below highlight Apache Iceberg in the context of cloud object storage.

The Java and Python API

Below are articles on Apache Iceberg’s Java and Python API.

Streaming with Apache Iceberg

Streaming data can require lots of considerations that don’t exist in batch processing. Below are resources that deal with using Apache Iceberg in streaming data.

Data as Code

Take your Apache Iceberg tables to the next level with Project Nessie/Dremio Arctic catalog, which allows you to create catalog-level branches for isolating ETL, catalog rollback, multi-table transactions, and more. Here are some talks and blogs on the subject.

Apache Iceberg Office Hours

Recordings of Apache Iceberg Office Hours, held as part of the Gnarly Data Waves podcast.

Miscellaneous Blog Articles

Here is a list of other great Apache Iceberg articles you can learn from.

Gnarly Data Waves

Episodes of the Gnarly Data Waves Podcast dedicated to Apache Iceberg, subscribe on Youtube or Spotify.

  1. Migrating from Delta Lake to Apache Iceberg
  2. Managing Your Data-as-Code
  3. Building your Apache Iceberg Data Lakehouse with Fivetran and Iceberg
  4. Optimizing your data files in Apache Iceberg
  5. How to Modernize your Hive Data Lakehouse with Apache Iceberg and Dremio
  6. Automatic Apache Iceberg Table Optimization with Dremio Arctic
  7. What’s New in the Apache Iceberg Project: Version 1.2.0 Updates, PyIceberg, Compute Engines
  8. Versioning and the Data Lakehouse

Iceberg Subsurface Conference Talks

Here is a list of Subsurface conference talks on Apache Iceberg.

Even more talks from Subsurface Live 2023!

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.