This article has been revised and updated from its original version published in 2022 to reflect the latest developments in all three table formats.
The three major open table formats (Apache Iceberg, Delta Lake, and Apache Hudi) each solve the "open lakehouse" problem differently at the architectural level. While the high-level comparison covers features and ecosystem support, this article dives deeper into the internal architecture of each format, how they organize metadata, handle commits, manage concurrent operations, and implement row-level changes.
Architecture determines everything in a data lakehouse: how fast queries plan (metadata structure), how efficiently data is pruned (statistics and partitioning), how safe concurrent operations are (commit protocols), and how much operational overhead your team absorbs (maintenance complexity). A format with superior architecture delivers faster queries, lower costs, and simpler operations, not just today, but as your data scales from gigabytes to petabytes.
Dremio's decision to build exclusively on Apache Iceberg's architecture reflects a deep analysis of these trade-offs. Iceberg's hierarchical metadata tree aligns naturally with Dremio's Apache Arrow-based vectorized execution engine and Columnar Cloud Cache (C3). The result is a query engine that can plan queries against tables with millions of files in under a second, prune 99%+ of data before any I/O occurs, and serve sub-second dashboard queries through live Reflections, all running on commodity cloud object storage.
Understanding these architectural differences helps you make an informed choice for your data platform and explains why Dremio built its lakehouse engine exclusively on Apache Iceberg.
Metadata Architecture: Three Approaches
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
This hierarchical structure is why Iceberg excels at cloud object storage: the engine never lists directories. Every file path is recorded in a manifest, and three levels of pruning eliminate most data from being read.
Delta Lake: Transaction Log
Delta Lake uses a flat transaction log stored in the _delta_log/ directory:
Each JSON file is an action list (AddFile, RemoveFile, Metadata, Protocol). To reconstruct the current table state, an engine must read the latest checkpoint plus all subsequent JSON files. Column statistics for data skipping are stored inline in AddFile actions.
Hudi: Timeline Architecture
Hudi organizes metadata around a timeline of actions stored in .hoodie/:
Hudi's timeline tracks every action (commit, compaction, clustering, cleaning) with timestamps. The timeline is append-only and provides a complete audit trail of every operation. Hudi uses a metadata table (stored as its own Hudi table) for file listings and column statistics.
Commit Protocols: How Each Format Handles Concurrency
Iceberg: Optimistic Concurrency Control (OCC)
Iceberg uses OCC with atomic compare-and-swap on the catalog metadata pointer:
Read the current metadata file
Compute changes (new snapshots, manifests)
Write new metadata file
Atomically swap the catalog pointer (compare-and-swap)
If swap fails → conflict detected → retry from step 1
This protocol works correctly on any storage system with an atomic pointer swap, implemented via database transactions in JDBC catalogs, conditional PutObject in REST catalogs, or atomic rename in HDFS.
Delta Lake: Write-Ahead Log
Delta Lake uses a write-ahead log protocol:
Read the latest version from _delta_log/
Write changes to _delta_log/{next_version}.json
Commit by creating the next sequential JSON file
Concurrency control relies on the underlying file system's ability to prevent two writers from creating the same file. On S3, this requires a lock service (DynamoDB). On HDFS, atomic rename provides natural conflict resolution.
Hudi: Timeline-Based Concurrency
Hudi's timeline tracks action states (REQUESTED → INFLIGHT → COMPLETED):
Create .inflight action on the timeline
Perform the operation
Transition to .commit on success
This state machine approach supports multiple concurrent writers with table-level or file-group-level locking.
Row-Level Operations: Architectural Differences
How each format implements UPDATE, DELETE, and MERGE operations reveals their architectural priorities:
Iceberg
Iceberg offers three mechanisms, controlled by table properties:
Copy-on-Write (V2): Rewrites entire data files with modifications applied. Best for batch workloads.
Merge-on-Read (V2): Writes separate delete/update files. Merged at read time during query execution.
Deletion Vectors (V3): Compact bitmaps marking deleted row positions within data files. The most efficient approach for sparse deletes.
Delta Lake supports COW (classic) and Deletion Vectors (recent addition). Deletion Vectors in Delta use a bitmap format similar to Iceberg V3. The original Delta approach was COW-only, with the engine rewriting affected files for every delete/update operation.
Hudi
Hudi was designed for record-level operations from the start:
COW tables: Rewrite entire file groups on updates
MOR tables: Write delta logs that are merged with base files during compaction or query time
Hudi's record-level index makes point lookups efficient, it can identify which file group contains a specific record key without scanning metadata.
Performance on Cloud Storage
The architectural differences have significant performance implications on cloud object storage:
Dremio's query engine architecture aligns naturally with Iceberg's hierarchical metadata:
C3 cache: Caches Iceberg manifest data and data file column chunks on NVMe SSDs
Apache Arrow execution: Vectorized processing of Parquet column chunks identified by manifest statistics
Reflections: Stored as Iceberg tables, enabling live and incremental refresh
OPTIMIZE TABLE: Single command for data compaction, manifest compaction, and sort optimization
VACUUM TABLE: Combined snapshot expiry and orphan file cleanup
This tight integration between Dremio's engine and Iceberg's metadata architecture enables sub-second query performance on petabyte-scale datasets stored on cloud object storage, a combination that isn't achievable with Hive, Delta Lake, or Hudi at the same cost point.
The Convergence Trend
All three formats are converging toward similar feature sets:
Delta Lake added Uniform format (Iceberg compatibility layer)
Despite convergence, the fundamental architectural differences remain. Iceberg's specification-first, engine-independent approach provides the strongest foundation for multi-vendor, multi-engine data lakehouses.
Metadata Overhead Analysis
Understanding each format's metadata footprint helps predict operational overhead at scale:
Iceberg's Puffin statistics are a unique architectural advantage, no other format provides table-level statistical aggregates like NDV (number of distinct values) and Theta sketches that enable cost-based join optimization.
Catalog Architecture Comparison
The catalog layer differs significantly between formats:
REST Catalog: Standard HTTP API for any catalog backend
Delta Lake Catalogs
Unity Catalog: Databricks' catalog (open-sourced as OSS Unity)
Hive Metastore: Basic catalog support
Glue: AWS integration
Hudi Catalogs
Hive Metastore: Primary catalog
Glue: AWS integration
No standardized REST catalog protocol
Iceberg's catalog diversity is a significant architectural advantage. Organizations can choose catalogs based on their infrastructure (Nessie for Git-like workflows, Polaris for multi-engine governance, Glue for AWS-native) without changing table format.
Frequently Asked Questions
Can I read Delta Lake tables from Dremio?
Yes. Dremio provides native Delta Lake support for tables written by V2 writers, allowing you to query Delta tables alongside Iceberg tables (for tables written with newer Delta writers, enable Uniform). However, Dremio's optimization features (Reflections, OPTIMIZE TABLE, VACUUM TABLE) are only available for Iceberg tables.
Is there a performance difference between Iceberg V2 and V3?
V3's deletion vectors are more efficient than V2's separate delete files for sparse deletes. For tables with frequent row-level changes, V3 can reduce read-time merge overhead by 30-50%.
Which format has the best GDPR compliance support?
All three formats support DELETE operations for GDPR compliance. However, Iceberg's snapshot expiry + compaction pipeline provides the most straightforward path to verifiable physical deletion, a key requirement for GDPR Article 17 compliance audits.
How do partitioning approaches differ architecturally?
Are the table format differences narrowing over time?
Yes, all three formats are converging on similar capabilities such as row-level deletes, partition evolution, and Z-ordering. However, Iceberg maintains the strongest position in multi-engine adoption and governance. Databricks created UniForm to produce Iceberg-compatible metadata from Delta tables, effectively acknowledging Iceberg as the interoperability standard. The governance differences remain significant: Iceberg's Apache Software Foundation stewardship provides vendor-neutral guarantees that no single-vendor-controlled format can match. For organizations evaluating table formats today, Iceberg offers the broadest ecosystem compatibility with Dremio, Spark, Trino, Flink, Snowflake, DuckDB, and StarRocks all providing native support.
In the age of data-centric applications, storing, accessing, and managing data can significantly influence an organization's ability to derive value from a data lakehouse. At the heart of this conversation are data lakehouse table formats, which are metadata layers that allow tools to interact with data lake storage like a traditional database. But why do these formats matter? The answer lies in performance, efficiency, and ease of data operations. This blog post will help make the architecture of Apache Iceberg, Delta Lake, and Apache Hudi more accessible to better understand the high-level differences in their respective approaches to providing the lakehouse metadata layer.
The metadata layer these formats provide contains the details that query engines can use to plan efficient data operations. You can think of this metadata as serving a similar role to the Dewey Decimal System in a library. The Dewey Decimal System serves as an abstraction to help readers more quickly find the books they want to read without having to walk up and down the entire library. In the same manner, a table format’s metadata makes it possible for query engines to not have to scan every data file in a dataset.
If you haven’t already read our content that compares these formats, here are some links to help get you acquainted:
Originating out of Netflix and then becoming a community run Apache project, Apache Iceberg provides a data lakehouse metadata layer that many vendors support. It works off a three-tier metadata layer.
Let’s examine the three tiers.
Metadata Files
At the heart of Apache Iceberg’s metadata is the metadata.json, which defines all the table-level information such as the tables schema, partitioning scheme, current snapshot along with a historical list of schemas, partition schemes, and snapshots.
Each snapshot listed in the metadata.json points to a “Manifest List” which lists all file manifests that make up that particular snapshot along with manifest-level stats that can be used to filter manifests based on things like partition values.
Each file manifest that is listed on the snapshot “Manifest List” has a listing of the individual files that contain data along with stats on each file to help skip data files based on each file's column stats.
Developed by Databricks, Delta Lake provides a data lakehouse metadata layer that benefits from many features primarily available on the Databricks platform, along with those built into its own specification. Two types of files handle most of the work with Delta Lake: log files and checkpoint files.
Delta Logs
Delta Logs are very similar to Git commits mechanically. In Git, each commit captures the lines of code that are added and removed since the last commit, whereas Delta Logs capture files added and removed from the table since the last commit.
For instance, a diary page (00000000000000000001.json) might read:
An engine can re-create the state of a table by going through each Delta Log file and constructing the list of files in the table. However, after many commits, this process can begin to introduce some latency. To deal with this, there are checkpoint files that summarize a group of log files so each individual log file doesn’t have to be read to construct the list of files in the dataset.
Apache Hudi is another table format that originated at Uber. Hudi’s approach revolves around capturing the timestamp and type of different operations and creating a timeline.
Directory Structure
Each Hudi table has several directories it uses to organize the metadata it uses to track the table.
/voter_data/: This is the folder of the table that will house partition folders with data files and the.hoodie folder that houses all the metadata.
/.hoodie/: This is the folder that holds all the table metadata tracking table properties and file metadata.
/hoodie.properties: List of properties on how the table is structured.
/metadata/: This is where the metadata is saved, including transactions, bloom filters, and more.
Hudi Metadata
The metadata folder in Hudi contains the metadata table which holds several indexes for improving transaction performance via data skipping and other optimizations.
Files Index: Stores file details like name, size, and status.
Column Stats Index: Contains statistics of specific columns, aiding in data skipping and speeding up queries.
Bloom Filter Index: Houses bloom filters of data files for more efficient data lookups.
Record Index: Maps record keys to locations for rapid retrieval, introduced in Hudi 0.14.0.
Hudi uses HFile, a format related to HBase, to maintain records of this metadata.
Base and Log Files: The Core Content
In Hudi’s world, there are two main types of data files:
Base Files: These are the original data files written in Parquet or ORC.
Log Files: These are files that track changes to the data in the base file to be reconciled on read.
Naming Conventions
The way Hudi names these files is:
Base Files: It’s like [ID]_[Writing Version]_[Creation Time].[Type]
Log Notes: More detailed as [Base File ID]_[Base Creation Time].[Note Type].[Note Version]_[Writing Version]
The Timeline Mechanics
Hudi loves order. Every action or change made to a table is recorded in a timeline, allowing you to see the entire history. This timeline also ensures that multiple changes don’t clash.
Actions on this timeline go through stages like:
Planning (requested)
Doing (inflight)
Done (commit)
These steps are captured in the hoodie folder through files with the naming convention of [timestamp].[transaction state(requested/inflight/commit)], and this is how Hudi is able to identify and reconcile concurrent transactions. If two transactions come in at the same timestamp, one of the two transactions will see the pending transaction and adjust accordingly.
Hudi Table Fields
Each table comes with some additional fields that assists in Hudi’s data lookup:
User Labels: These are the fields of the table the user specified when they created the table or updated the schema.
Hudi’s Own Labels: Fields created by Hudi to optimize operations which include _hoodie_commit_time, _hoodie_commit_seqno, _hoodie_record_key, _hoodie_partition_path, _hoodie_file_name.
Each format takes a very different approach to maintain metadata for enabling ACID transactions, time travel, and schema evolution in the data lakehouse. Hopefully, this helps you better understand the internal structures of these data lakehouse table formats.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.