For years, the promise of the open lakehouse was simple: store your data once, query it with any tool, and never get locked into a single vendor's ecosystem. Apache Iceberg made that promise real. It became the industry-standard table format because it worked, it was open, and it kept getting better.
Iceberg version 3 (V3) is the latest proof of that progress. And with the March release of Dremio Cloud, the most significant V3 capabilities are available today. But this is not just a version bump. V3 addresses problems that data teams have lived with in production for years, problems that workarounds and compaction jobs and schema gymnastics were never designed to truly solve.
Every data platform eventually runs into the same friction points. The data your business cares about most arrives in formats that resist clean structure. Event streams from APIs, logs from IoT devices, and webhook payloads all carry flexible, schema-varying content that forces a choice: spend engineering effort forcing the data into rigid columns, or accept the performance and usability penalties of storing it as raw JSON strings.
Neither option is good. Wide tables with hundreds of nullable columns are a maintenance burden and a storage cost. Storing JSON as text means every query has to parse it on the fly, burning compute and slowing analysts down. Teams have been accepting this tradeoff because there was no better path.
The same tension shows up in frequently changing workloads. Pipelines that handle change data capture, GDPR deletion requests, or data correction workloads need to update and delete rows regularly. In earlier versions of Iceberg, every delete created a separate delete file that had to be reconciled with the underlying data at query time. As delete files accumulated, read performance degraded, and the operational fix was frequent compaction runs that consumed compute and required careful scheduling.
Governance and auditability present a third friction point that often gets less attention until it becomes urgent. When a regulator asks which source record produced a specific output row, or when a data quality issue surfaces and the team needs to trace it back to its origin, most platforms offer no clean answer. Row-level lineage has historically required bolt-on tooling or custom logging, adding complexity that the underlying format should have handled natively.
V3 addresses all of these directly, alongside a set of capabilities that close real gaps in precision, schema management, and operational overhead.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Dremio Optimizing for V3
Semi-structured data becomes a first-class citizen.
The VARIANT data type in Iceberg V3 gives teams a native way to store flexible, schema-varying data alongside structured columns. Event payloads that look completely different from one record to the next can live in a single VARIANT column, queryable with standard SQL, without requiring upfront schema decisions or runtime JSON parsing. The data is stored in an optimized binary format that the query engine understands natively, which means VARIANT is not a workaround but a performance-oriented feature built into the format itself.
This matters for any team working with APIs, event streams, or application logs. Instead of maintaining fragile ingestion pipelines that try to normalize every field, teams can ingest data as it arrives and query the structure they need when they need it. Dremio also supports variant shredding, which extracts frequently accessed fields into separate columnar chunks, so the most common query patterns get the same read performance as fully typed columns.
Deletes and updates get faster, with less operational overhead.
Deletion vectors replace the position delete file model with a compact bitmap that marks deleted rows directly in the data file's metadata. The practical impact is significant: early testing shows read performance improvements of 50 to 80 percent compared to V2 positional deletes. For teams running CDC pipelines, processing frequent deletion requests, or enforcing data retention policies, this means fewer table maintenance jobs and a more manageable operational footprint. Compaction remains a best practice, but it becomes a maintenance step rather than a performance emergency.
Row lineage brings native auditability to the format.
V3 introduces row-level lineage tracking directly in the Iceberg specification. Every row carries metadata that identifies its origin and tracks how it has changed over time. For data governance and compliance teams, this changes what is possible. Auditing queries no longer require reconstructing row history from external logs or snapshot diffs. Teams running CDC pipelines get a clean, format-native way to trace exactly which source records drove downstream changes. Organizations subject to GDPR, CCPA, or other data regulations gain a reliable audit trail without building it themselves. Row lineage is the kind of capability that sounds like a nice-to-have until an audit or a data quality incident makes it essential.
Schema management gets more precise.
V3 also introduces default column values, a capability that sounds simple but eliminates a persistent source of friction in schema evolution. When a new column is added to an existing table, V3 allows teams to define what value that column should carry for rows written before the column existed. Previously, adding a column to a table with existing data meant accepting nulls for historical rows or running a backfill job. Default values make schema evolution cleaner, cheaper, and more predictable for any team managing tables that grow and change over time.
The Full Stack of Dremio's Iceberg Advantage
Iceberg V3 support is the latest layer of a platform that has been built around Iceberg at every level. The value Dremio delivers on Iceberg is not limited to query execution. It spans catalog, optimization, and the intelligence layer that keeps lakehouse tables performing well over time without requiring constant manual intervention.
An open catalog built on Apache Polaris.
Dremio's Open Catalog is built on Apache Polaris, the open-source catalog that Dremio was a co-creator of, and which recently graduated as a top-level Apache project. Polaris provides a standards-based, interoperable catalog that any Iceberg-compatible engine can connect to. Teams can manage access control, table metadata, and cross-engine interoperability through a single catalog layer without vendor lock-in. Because Polaris is open and governed by the Apache community, the catalog investment is as durable as the format itself.
Autonomous table management that removes the operational burden.
Keeping Iceberg tables healthy in production requires ongoing maintenance: compacting small files, vacuuming expired snapshots, and clustering data so that queries skip as much of the table as possible. On most platforms, this is a manual, scheduled process that teams have to design, monitor, and tune themselves. Dremio automates it. Autonomous Reflections and table optimization services continuously monitor table health and trigger compaction, vacuuming, and Iceberg clustering without requiring manual intervention. Iceberg clustering organizes data by the columns most commonly used in query filters, which directly reduces the data scanned per query. The result is that tables stay fast as they grow, without operational overhead shifting onto the data engineering team.
A query engine built to exploit Iceberg's architecture.
Dremio's Intelligent Query Engine is designed to take full advantage of how Iceberg stores and organizes data. Iceberg's metadata layer enables aggressive partition pruning, file-level skipping, and predicate pushdown, and Dremio's planner is built to use all of it. For V3 specifically, VARIANT support in the query engine means semi-structured data is processed using the same columnar optimizations that apply to typed columns, including the performance benefits of variant shredding. Teams are not trading query performance for schema flexibility. They get both.
The Agentic Lakehouse Is Ready to Work for Your Business
The conversation around data platforms has shifted. The goal is no longer just storing and querying data efficiently. It is turning data into decisions, at scale, with the speed and intelligence that modern business demands. Dremio's Agentic Lakehouse brings that vision together: a platform where AI-powered analytics, autonomous optimization, and open interoperability combine to deliver real business outcomes. Better insight from every data type your organization produces. A healthier, higher-performing data landscape that manages itself. And a measurable return on the infrastructure investment you have already made.
Underpinning all of it is Apache Iceberg, and Dremio is the only lakehouse platform built and fully optimized for it. Not Iceberg as an afterthought. Iceberg as the foundation, with every layer of the platform, from the query engine to the catalog to the autonomous optimization layer, designed to get the most out of it.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.