Dremio Blog

7 minute read · April 27, 2026

Iceberg Deletion Vectors: The Better Way to Delete Rows

Will Martin Will Martin Technical Evangelist
Start For Free
Iceberg Deletion Vectors: The Better Way to Delete Rows
Copied to clipboard

For all the many improvements data lakehouses bring to analytics, there's one uncomfortable trade-off: deleting rows is expensive. In a system built around immutable Parquet files, a delete is actually a rewrite. You read the file, filter out the rows you don't want, and write a new file. At scale those I/O costs mount up fast.

Apache Iceberg v2 offered an improvement with merge-on-read semantics: instead of rewriting a data file immediately, you write a separate "position delete file" that records which rows to treat as deleted. Reads then apply those deletes on the fly. The problem is that as position delete files accumulate, reads get progressively slower. Each query has to join the delete files against the data files to figure out which rows are actually live. Again, at scale on a popular table that overhead adds up.

Iceberg v3 replaces position delete files with deletion vectors, and it's a meaningful step forward. Dremio supports deletion vectors fully on v3 tables. Here's what has changed and why it matters.

How Deletion Vectors Work

A deletion vector is a bitmap stored in a Puffin file alongside a data file, with a direct 1:1 mapping between them. Each row position in the data file has a corresponding bit in the bitmap. When a row is deleted, its bit is flipped. It's that simple.

During a read, Dremio applies the deletion vector as a bitmask over the data file. Rows flagged in the bitmap are excluded from the result. There's no join with a separate delete file, no path matching, and no cross-file lookups. The bitmap is more compact, the mapping is direct, and the read overhead is minimal compared to what v2 position delete files require. Our testing shows a read performance improvement of 50-80% with deletion vectors when compared to positional deletes.

When you run a DELETEUPDATE, or MERGE on a v3 table, Dremio writes or updates the deletion vector for the affected data file rather than producing a new delete file or immediately rewriting the data. The operation completes quickly and the data file isn't touched.

However, just like position deletes, deletion vectors accumulate over time as more rows are marked deleted. Dremio's OPTIMIZE TABLE command handles this by rewriting data files to produce clean Parquet files that incorporate the deletions, removing the vectors in the process. After a compaction run, affected data files have no deletion overhead at all. Running OPTIMIZE on a regular schedule is best practice to keep read performance steady as your table evolves.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Upgrade Path From v2 Tables

If you're moving existing Iceberg v2 tables to v3, deletion vectors don't require any migration of your delete history. Dremio can still read v2 position delete files on upgraded tables. The first merge-on-read operation after the upgrade, whether that's a DELETEUPDATE, or MERGE, converts existing position deletes into deletion vectors automatically. From that point, all new deletes use the v3 format.

Where Deletion Vectors Make a Real Difference

The most direct beneficiaries are workloads that delete or update rows frequently without wanting to pay full rewrite costs every time.

GDPR and right-to-erasure workflows are the obvious compliance case. When a data subject requests deletion, the operation needs to be fast and auditable. With deletion vectors, marking rows deleted is a bitmap write rather than a file rewrite. You can process erasure requests at high frequency without the I/O cost of rewriting data files on each request. Compaction runs can occur later, at a time that suits the workflows, and the physical data is cleaned up.

Change data capture pipelines that land CDC events as upserts are another strong fit. A CDC stream typically produces a mix of inserts, updates, and deletes as upstream records change. On a v2 table, frequent deletes and updates drive accumulation of position delete files that degrade read performance over time. On a v3 table with deletion vectors, the overhead is lower and more predictable, and compaction is the single lever that keeps it in check.

High-frequency MERGE operations, common in slowly changing dimension tables and deduplication pipelines, also see a meaningful improvement. Merge-on-read with v3 deletion vectors is faster than the equivalent on v2 position delete files, which means you can run merges more aggressively without degrading downstream query performance.

Getting Started

Deletion vectors are available on Iceberg v3 tables in Dremio Cloud today. Create a v3 table, run your normal DELETEUPDATE, or MERGE operations, and the deletion vectors are handled automatically. Then schedule OPTIMIZE TABLE to compact periodically to keep read overhead from accumulating.

If you want to test this against your own workload, a free Dremio Cloud environment at dremio.com/get-started has full Iceberg v3 support from the start. I'd recommend running a before-and-after comparison on a delete-heavy table to see for yourself how the read overhead compares.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.