Dremio Blog

23 minute read · May 23, 2026

Apache Iceberg V2 vs V3: What Changed and What It Means for Your Tables

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
Apache Iceberg V2 vs V3: What Changed and What It Means for Your Tables
Copied to clipboard

Apache Iceberg is not a static format. The spec version number stamped into every table's metadata controls which features that table can use, which engines can read it, and how efficiently row-level changes are handled. The jump from Apache Iceberg V2 vs V3 is not cosmetic. It introduces deletion vectors, a native semi-structured column type, and nanosecond timestamp precision, each of which solves a concrete limitation that V2 users hit in production.

If you manage Iceberg tables at scale, you need to understand what changed, what it costs to upgrade, and whether your engine stack is ready. This post walks through each version, what it added, and the practical decisions that come with upgrading.

What the Iceberg Format Version Number Actually Means

Every Iceberg table has a metadata JSON file. Inside that file is a single field:

{
  "format-version": 2
}

This number is the table format version. It is not the version of the Iceberg library you are running, and not the version of Spark or Dremio. It is a property of the table itself, stored in the catalog alongside the table's schema and snapshot history.

Two things follow from this design. First, different tables in the same catalog can run different format versions. Your historical archive table might stay on V2 while a new event stream table uses V3. Second, format version upgrades are one-way. There is no rollback. Once you write V3 metadata, engines that only understand V2 will fail to read the table.

This makes version selection a deliberate architectural decision, not an automatic library upgrade.

The official Apache Iceberg table format specification lives at iceberg.apache.org/spec. It defines the exact metadata structures, file layouts, and encoding rules that engines must follow at each version.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

A Quick Recap of V1 and V2

Understanding where V3 fits requires a clear picture of what came before it.

V1: The Foundation

V1 established the core Iceberg design: snapshot-based ACID isolation, schema evolution without rewrites, partition evolution, and hidden partitioning. Every write produces a new snapshot, and readers always see a consistent view of the table at a point in time.

The limitation in V1 was row-level mutations. If you needed to delete or update a specific row, you had to rewrite the entire Parquet file containing it. For append-heavy workloads this was fine. For any table receiving updates or deletions, the rewrite cost was prohibitive at scale. You can explore the full architectural design in the Apache Iceberg Architectural Guide.

V2: Row-Level Deletes Arrive

V2 introduced row-level deletes without requiring data file rewrites. It did this through two new file types:

Position delete files record specific (file_path, row_position) pairs. The engine reads the delete file and skips any rows at the listed positions within the listed data files.

Equality delete files record predicate-style deletes: "delete all rows where user_id = 42." The engine applies the filter at read time.

Both mechanisms implement merge-on-read (MOR) semantics. Writes are cheap because you only write a small delete file. Reads pay the merge cost: loading both the data file and any relevant delete files, then reconciling them.

This model made Iceberg practical for Change Data Capture (CDC), streaming upserts from tools like Apache Flink, and GDPR-style targeted row deletions. V2 became the standard format version and today is supported by virtually every major query engine: Spark, Trino, Flink, Presto, Dremio, StarRocks, Snowflake, and more.

The tradeoff is delete file accumulation. A high-update table can generate thousands of small delete files over hours. Without regular compaction, query performance degrades because every read must scan and merge an increasing number of delete files. Compaction collapses these back into clean data files, but compaction itself is compute-intensive and introduces latency.

What Apache Iceberg V3 Adds

V3 does not replace V2's concepts wholesale. It builds on them while fixing the biggest pain points data engineers hit in production. There are four major additions to know.

Deletion Vectors: A Better Way to Delete Rows

The core problem with V2 delete files is that they are separate files. For every data file that has deletions, you potentially have one or more delete files. A table with 10,000 data files and active updates can generate tens of thousands of delete files. Every read must identify which delete files apply to which data files, load them, and merge the results. This creates two costs: additional I/O requests and the CPU work of file joins.

V3 replaces this with deletion vectors. Instead of a separate file, a deletion vector is a compact bitmap (typically a Roaring Bitmap) that marks which row indices within a specific data file are deleted. The vector is stored in the table's delete layer but as a single bitmap entry per data file rather than a file-per-deletion-batch.

The read path changes meaningfully. An engine reads the data file and the deletion vector simultaneously. It uses the bitmap to skip flagged row positions without file joins. The overhead scales with the number of deleted rows, not the number of delete operations.

This approach was pioneered in similar forms by Delta Lake and is now standardized in the Iceberg V3 spec. For high-mutation tables like CDC targets or event stream consumers, deletion vectors reduce merge-on-read overhead significantly. Fewer files also mean less catalog pressure and smaller manifest files.

Variant Type: Native Semi-Structured Data

Data engineers storing JSON payloads in Iceberg tables have lived with a frustrating workaround: declare the column as STRING, write the serialized JSON, and parse it at query time. This approach has three problems.

First, no predicate pushdown. The query engine cannot skip files based on a JSON field value because the field is not typed in the schema. The engine must read every file and parse every JSON string to find matching rows.

Second, CPU cost at read time. Parsing JSON is not free. For wide events with nested structures, parsing every row before filtering is wasteful when most rows will be discarded.

Third, no type safety. A field that is sometimes an integer and sometimes a string causes either casting errors or silent data quality issues.

V3 introduces the Variant type, a native column type for semi-structured data. At write time, the Variant encoder processes the JSON-like structure and stores it in a shredded binary format. Typed leaf values (integers, booleans, strings) are stored with their types. The structure of the object is stored as metadata.

At read time, engines can access specific sub-fields without deserializing the entire payload. Predicate pushdown becomes possible for known paths. A query filtering on payload.event_type = 'click' can skip entire files and row groups that don't contain matching values.

The Variant specification in Iceberg V3 is aligned with the VARIANT type introduced in Apache Spark 4.0 and the broader semi-structured data standardization effort in the data lake community. For mixed structured/semi-structured workloads, including log analytics, IoT pipelines, and API event streams, this is one of the most impactful additions in V3.

Nanosecond Timestamp Precision

V1 and V2 support microsecond precision for timestamp columns (TIMESTAMP and TIMESTAMPTZ). A microsecond is 10^-6 seconds, which is sufficient for most analytical workloads.

There are domains where it is not. High-frequency trading systems distinguish order events that differ by nanoseconds (10^-9 seconds). Network packet capture tools record timestamps at nanosecond resolution. IoT sensor streams from precision instruments can produce events faster than microsecond intervals. Linux kernel tracing and distributed systems profiling also operate at nanosecond scale.

V3 adds timestamp_ns and timestamptz_ns column types with nanosecond precision. The underlying storage uses INT64 encoding, the same physical type as microsecond timestamps, but the interpretation of the value changes to nanoseconds since epoch. This is backward-compatible at the physical layer but requires engines that understand V3 type semantics to correctly handle the values.

If you are building event pipelines for financial markets, precision IoT, or high-resolution distributed tracing, nanosecond timestamps in V3 eliminate the need to store timestamps as raw INT64 outside the type system.

Other V3 Changes: Multi-Argument Transforms and Row Lineage

V3 also expands the partitioning model. V2 supports only single-argument partition transforms: bucket(N, col)truncate(N, col)year(col)month(col)day(col)hour(col). V3 allows multi-argument transforms, enabling compound partition expressions that span multiple columns. This builds on Iceberg's existing hidden partitioning design and opens up more sophisticated data layout strategies. For background on how hidden partitioning reduces full table scans, see Apache Iceberg Hidden Partitioning.

Row lineage tracking is a V3 roadmap item. It will record which rows were written by which operation, enabling fine-grained audit trails and row-level provenance. This feature is still in specification discussion and not yet available across engine implementations, but it represents a significant future capability for compliance-heavy use cases.

Feature Comparison: V1 vs V2 vs V3

FeatureV1V2V3
Row-level deletesNone (full file rewrite)Position and equality delete filesDeletion Vectors (compact bitmaps)
Semi-structured dataSTRING + parse at querySTRING + parse at queryNative Variant type
Timestamp precisionMicrosecondMicrosecondNanosecond
Delete overheadFull file rewrite costMany small delete files accumulateSingle compact bitmap per data file
PartitioningSingle-arg transformsSingle-arg transformsMulti-arg transforms
Row lineageNoneNonePlanned
Engine supportUniversalVery broadGrowing (2025-2026)

Checking Your Format Version and Upgrading

Before you decide whether to upgrade, you need to know what version your tables are currently running. Here is how to check in common environments.

In Spark using the Iceberg REST catalog, you can query table properties directly:

-- Check format version in Spark
SHOW TBLPROPERTIES my_catalog.schema.my_table ('format-version');

In Dremio, you can query the table's snapshot history which includes format metadata:

-- Check your table's current format version in Dremio
SELECT format_version
FROM TABLE(table_snapshot('my_catalog.schema.my_table'))
ORDER BY committed_at DESC
LIMIT 1;

To upgrade an existing V2 table to V3 in Spark:

-- Upgrade a table to V3 (Spark with Iceberg runtime 1.7+)
ALTER TABLE my_catalog.schema.my_table
SET TBLPROPERTIES ('format-version' = '3');

This command updates the table metadata to declare V3. Existing data files remain unchanged. They are still valid Parquet files. New writes and delete operations will use V3 semantics going forward.

To create a brand-new V3 table with a Variant column and nanosecond timestamp:

-- Create a V3 table with Variant column (Spark 4.0+)
CREATE TABLE my_catalog.schema.events (
  id BIGINT,
  event_time TIMESTAMP_NTZ,
  payload VARIANT
) USING iceberg
TBLPROPERTIES ('format-version' = '3');

Note that TIMESTAMP_NTZ here uses microsecond precision. To use nanosecond precision, you would declare TIMESTAMP_NTZ with the V3 timestamp_ns type. Engine-specific syntax may vary as V3 support matures.

Engine Support for Iceberg V3 in 2025

V2 achieved broad engine support over several years. V3 is in an earlier phase. As of mid-2025, the landscape looks like this:

Apache Spark 4.0 with iceberg-spark-runtime 1.7+ is the primary environment for V3 read and write support. This is where deletion vectors and Variant type are most mature.

Apache Flink has Iceberg integration that is tracking toward V3 support, though full feature parity lags Spark.

Trino and Presto are implementing V3 reader support. Deletion vector reads are the highest priority given their direct impact on query performance.

Dremio is built natively on Apache Iceberg and is actively implementing V3 reader support. Dremio's architecture, which uses Apache Arrow for in-memory column processing and Arrow Flight for data delivery, is well-suited to take advantage of deletion vectors during scan planning.

Snowflake and BigQuery have read-path Iceberg support via external table integrations. Their V3 timelines depend on their integration layer updates.

The practical risk of upgrading too early is not data loss, since the data files themselves are unchanged. The risk is that an engine in your stack that does not yet understand V3 metadata will refuse to open the table. If your Spark cluster writes V3 tables and your Trino cluster is the primary read surface, verify Trino's V3 support before upgrading.

When to Stay on V2 and When to Move to V3

The right choice depends on your table's workload pattern and your engine stack composition.

Stay on V2 if:

You run a multi-engine environment where not every engine has confirmed V3 support. Mixed-version stacks with a V3 table that one engine cannot read will create operational incidents.

Your tables are primarily append-only or low-mutation. The benefits of deletion vectors are most visible on high-update tables. An append-only historical log table on V2 performs well and does not need V3.

Your data is fully structured. If all your columns are typed scalars, timestamps, and strings without embedded JSON, Variant type adds nothing to your workload.

Move to V3 if:

You run tables with high delete or update rates, such as CDC targets, streaming upsert sinks, or tables that receive GDPR deletion requests regularly. Deletion vectors will reduce the read overhead and compaction frequency for these tables.

You store semi-structured data (JSON, nested events, API payloads) and want query-time performance without the parse overhead. Variant type is a direct improvement for these workloads.

You work in financial markets, precision IoT, or distributed systems profiling where microsecond timestamps are not granular enough.

You are building a new table from scratch and your engine stack is Spark 4.0 or newer. Starting on V3 avoids a future migration.

Remember: the upgrade is irreversible. Test the upgrade in a non-production environment, verify that all reader engines in your stack can open the table, and confirm compaction jobs and incremental processing pipelines continue to function correctly before upgrading production tables.

How Dremio Works with Iceberg V3

Dremio's query engine is built on Apache Iceberg as its native table format. There is no proprietary translation layer converting Iceberg metadata into an internal representation. Dremio reads and writes Iceberg metadata directly. This architecture means V3 feature support maps cleanly onto Dremio's query planning and execution path.

For deletion vectors, Dremio's scan planning can incorporate bitmap-based row filtering during file pruning. Rather than loading delete files and performing row-position joins during execution, deletion vectors can be consulted at planning time to produce tighter scan ranges. Combined with Dremio's C3 (Coordinator-level Column Cache) and Autonomous Reflections, deletion vector-aware planning reduces the compaction burden that currently makes Reflections refresh more expensive on high-mutation tables.

Dremio's AI Semantic Layer can expose Variant columns with named business definitions. Instead of ad-hoc GET_JSON_OBJECT queries against a string column, a Variant column with defined sub-paths can be surfaced as first-class business metrics in the semantic layer. This bridges the gap between raw semi-structured events and the governed, business-readable data model that BI tools and AI agents consume. For more on how semantic layers work in this context, see What Is a Semantic Layer.

Arrow Flight accelerates the delivery of data that has been filtered through deletion vector processing. Once Dremio's execution engine applies the bitmap mask and produces the clean result set, Arrow Flight streams it to clients at full network bandwidth without serialization overhead.

At the catalog level, Dremio's Open Catalog (based on Apache Polaris) manages table discovery and access control. Format version is a table-level concern stored in the metadata the catalog tracks. Open Catalog's fine-grained access control (FGAC) works the same way regardless of whether a table is V2 or V3.

For teams building toward an Agentic Lakehouse architecture, V3 features matter because AI agents issuing SQL through Dremio's MCP Server benefit from faster query execution on high-mutation tables. A deletion vector-accelerated scan that returns in 2 seconds instead of 8 seconds makes the difference between a responsive agentic workflow and one that stalls.

The Upgrade Decision in Practice

Here is a concrete framework for evaluating an upgrade:

Step 1: Inventory your tables. Classify them by mutation rate (high/medium/low) and data type composition (structured only vs. semi-structured). High-mutation tables with semi-structured data are the strongest candidates for V3.

Step 2: Audit your engine stack. List every engine that reads or writes each table. For each engine, verify V3 support status. If any engine in the chain is V3-incompatible, that table is not ready for upgrade.

Step 3: Test in staging. Upgrade a copy of the candidate table to V3. Run your full query suite. Verify that compaction jobs, incremental ingestion pipelines, and BI reports all work correctly. Check that deletion vectors are being written (look for the content type in manifest files).

Step 4: Upgrade production selectively. Start with a single high-mutation table. Monitor read performance and compaction frequency for two weeks. If the metrics improve and no engine failures occur, proceed with additional tables.

Step 5: New tables start on V3. For any new table in a V3-capable environment, start on V3 from day one rather than planning a future migration.

This phased approach minimizes risk while capturing V3's performance benefits where they matter most.

Conclusion

Apache Iceberg V3 is a meaningful advancement over V2, not a version bump for its own sake. Deletion vectors address the fundamental I/O cost of merge-on-read that V2 delete file accumulation creates. The Variant type eliminates one of the most common workarounds in modern data pipelines: storing JSON as strings and parsing at query time. Nanosecond timestamp precision expands the range of workloads Iceberg can handle natively.

The barrier to adoption is engine readiness, not technical soundness. V3 is well-specified and actively being implemented across the ecosystem. The practical path is to identify your highest-mutation and most semi-structured tables today, verify your engine stack's V3 support, and begin upgrading those specific tables rather than waiting for a full ecosystem migration.

Data engineers who adopt V3 selectively and intentionally over the next 12 months will operate with measurably better read performance on their most demanding tables. The teams that wait for "universal support" before acting will delay those gains unnecessarily.

Try Dremio Cloud free for 30 days and run your Iceberg V3 queries against a query engine built natively for the open lakehouse: dremio.com/get-started

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.