Dremio Blog

7 minute read · April 6, 2026

Iceberg Won The Table Format Wars. What Does That Mean for You?

Read Maloney Read Maloney CMO, Dremio
Start For Free
Iceberg Won The Table Format Wars. What Does That Mean for You?
Copied to clipboard

The third annual Iceberg Summit is happening this week and it’s rapidly growing into one of the must attend data events for the year. Why? Well, Iceberg won the table format wars a couple years ago because companies wanted to avoid lock-in and they wanted interoperability. The Iceberg lakehouse also quietly became the default data architecture for the AI-era.

Agentic workloads benefit when an organization can access all of its data: structured, semi-structured, and unstructured. So the lakehouse is now the default over BI-era cloud data warehouses.   

Customers demanded openness: Apache Iceberg is open and Delta Lake is not. That's why Snowflake, Databricks, and Microsoft Fabric all adopted it. Ecosystems always benefit customers. And that's why tens of thousands of enterprises are running or planning to run their data platforms on Iceberg.

However, there is a surprise for many as they adopt the flexibility of an open lakehouse, it can be harder to manage than they expect.

The Management Tax

Iceberg tables fragment over time. Every INSERT, UPDATE, and DELETE creates new data files. Small files pile up. Metadata grows. Snapshots accumulate. That dashboard that used to load in two seconds? Now it's twelve. 

So your engineers start writing compaction jobs. They schedule OPTIMIZE runs. They tune partition strategies. They build monitoring to catch when file counts get out of hand. They debate whether it's safe to rewrite manifests during business hours.

None of this is the work they want to do. They were hired to bring in new data sources, build data products, train models, and answer business questions. Instead, they're babysitting and optimizing tables..

And performance? Out of the box, Iceberg queries are slower than what your team remembers from the warehouse. Not because Iceberg is slow, but because nobody is optimizing the physical layout for how your team actually queries the data. In a warehouse, the vendor handled that. In a lakehouse, it can be on you if you don’t have access to the technologies that make Iceberg simple.  

What If the Lakehouse Managed Itself?

This is where I'll stop being objective and tell you what Dremio built, and why.

Dremio wasn't built by bolting Iceberg support onto an existing product. It was built for Iceberg from the ground up. Engine, catalog, planner, maintenance layer. Every component understands Iceberg natively, which means every component works together to manage and query your tables without the manual work.

Your tables optimize themselves with Iceberg Clustering. Dremio continuously optimizes the physical layout of your data based on how your team actually queries it. When new data arrives and locality degrades, Dremio targets just the degraded regions for rewriting. No full table rewrites. The table heals itself. In Dremio, it's built in. It just runs.

Your queries accelerate themselves. Most organizations maintain silver and gold layers through complex ETL pipelines that consume a huge share of their compute budget. With Dremio, you maintain those layers virtually and let the platform materialize only what's needed. Autonomous Reflections observe your actual query patterns and automatically create and maintain optimized physical representations as Iceberg Tables. When your workload shifts (new queries, new users, new AI agents), Reflections adapt. Unused ones get deprecated. New ones get created. Other platforms require you to manually create and maintain materialized views. Dremio figures out what you need and builds it for you. The result is 20x faster queries than competing lakehouses on TPC-DS benchmarks.

Your table maintenance disappears. File compaction, snapshot expiration, manifest rewriting, orphan file cleanup. Dremio handles all of it autonomously. Hot partitions get more aggressive treatment. Maintenance jobs automatically schedule around low-traffic windows.  In Databricks, you schedule and manage these jobs yourself. In Snowflake, automatic maintenance only covers Snowflake-managed Iceberg tables, and it doesn't handle orphan file cleanup. In Dremio, your engineers never write another compaction job.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Apache Polaris Ensures You Get Interoperability and Avoid Getting Locked-In on the Catalog

Dremio co-founded Apache Polaris, the open catalog standard that governs every table in the platform. Every table Dremio manages is immediately accessible to any compatible engine. Read and Write from Spark, Trino, Flink, DuckDB, Dremio whatever your team runs. 

Databricks requires Unity Catalog for their Iceberg features. Snowflake routes everything through managed tables. Both offer Iceberg support, on their terms, through their catalog, on their compute. 

With Dremio, your Iceberg tables sit in your object storage, governed by an open catalog, readable and writable by any engine. For AI workloads that need to access data from multiple engines and frameworks simultaneously, that's not a nice-to-have, it's a requirement.

Now Add V3

Apache Iceberg V3 is the biggest evolution of the standard since row-level deletes arrived in V2. Dremio shipped V3 table read/write. Here's what it unlocks.

The headline feature is binary deletion vectors. Instead of writing position delete files that readers have to reconcile at query time, V3 uses compact bitmaps. The practical impact: dramatically faster updates and deletes with less compute overhead. If you're running CDC pipelines, streaming ingestion, or anything where data changes frequently, your Iceberg tables stay fresh without the performance penalty. On Dremio, this feeds directly into our autonomous performance capabilities. As deletion vectors accumulate, Dremio's automated maintenance compacts them back into clean data files on a continuous cycle. Fast writes and fast reads with nothing to schedule.

V3 also introduces row-level lineage (_row_id and _last_updated_sequence_number on every row), the VARIANT type for semi-structured data that eliminates schema-on-write bottlenecks, and nanosecond-precision timestamps for financial services and IoT workloads. These aren't incremental improvements. They're the features that make Iceberg ready for the next generation of AI and real-time analytics workloads.

The Bottom Line

The Iceberg lakehouse is the default architecture for AI and analytics. That debate is over. The question now is which platform makes Iceberg simple to manage, without continually trying to lock you in to their platform?  That’s Dremio, not Snowflake or Databricks.  Dremio is the only Iceberg-native data platform.  Simple, low-cost, and designed for the AI-era.  Try it out for free to experience the difference.  

Get started at dremio.com/get-started →

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.