15 minute read · April 30, 2025

What’s New in Apache Iceberg Format Version 3?

Alex Merced

Alex Merced · Head of DevRel, Dremio

Pick up a free copy of Apache Iceberg: The Definitive Guide and an Early Release copy of Apache Polaris: The Defintivie Guide.

Apache Iceberg has rapidly become a cornerstone of modern data lake architecture, offering robust support for large-scale analytics with features like schema evolution, ACID transactions, and time travel. Since its inception, Iceberg has evolved through different format versions, each introducing new capabilities to meet the demands of modern data workloads.

The first two format versions, V1 and V2, laid a strong foundation. V1 introduced a metadata-driven approach that eliminated reliance on file system directories, while V2 expanded functionality with row-level deletes, enabling support for incremental updates and streaming pipelines.

Now, with the introduction of format version 3, Iceberg pushes the boundaries even further. V3 is designed to support more diverse and complex data types, offer greater control over schema evolution, and deliver performance enhancements suited for large-scale, high-concurrency environments. This blog explores the key differences between V1, V2, and the new V3, highlighting what makes V3 a significant step forward in Iceberg's evolution.

A Quick Recap of Iceberg V1 and V2

Before diving into the new capabilities introduced in version 3, it’s important to understand the foundation built by the first two format versions of Apache Iceberg. Each version has progressively expanded the functionality of the table format, keeping compatibility and performance at the core.

Iceberg V1 – Foundation for Analytical Tables

The initial format version focused on building a stable, scalable structure for large analytical workloads. Key features included:

  • Use of immutable file formats like Parquet, Avro, and ORC
  • Snapshot-based isolation and time travel support
  • Schema evolution with safe column additions and renames
  • Explicit tracking of data files through manifests and metadata files

V1 allowed query engines to read large datasets consistently without relying on brittle file system conventions.

Iceberg V2 – Enabling Row-Level Deletes

Version 2 introduced the ability to handle mutable operations more efficiently, especially for use cases like change data capture and streaming ingestion.

  • Row-level deletes were supported via delete files (position and equality deletes)
  • Writers were required to follow stricter validation rules
  • The specification introduced sequence numbers to maintain operation ordering
  • Improved support for merge-on-read semantics

With these changes, V2 made Iceberg more suitable for dynamic workloads and real-time data updates.

What’s New in Iceberg V3

Apache Iceberg format version 3 introduces a set of capabilities aimed at enhancing flexibility, performance, and expressiveness in data modeling. While V1 and V2 focused on stability and row-level operations, V3 is about expanding the format to accommodate more complex use cases and data types.

New Data Types

V3 brings support for several advanced data types, allowing broader data modeling options:

  • timestamp with timezone (nanosecond precision)
  • variant for semi-structured data, similar to JSON
  • geometry and geography for geospatial analytics
  • unknown to support dynamic schemas

These additions help Iceberg cater to domains like IoT, analytics with location data, and systems that rely heavily on semi-structured input.

Default Column Values

You can now define default values for columns at the schema level. This simplifies schema evolution and reduces the need for client-side logic to populate values during insert operations.

Multi-Argument Transforms

V3 adds support for multi-argument partition and sort transforms. This enables more advanced partitioning strategies, such as bucketing based on multiple columns or composite date functions.

Row Lineage Tracking

The new spec allows capturing lineage information for rows. This is especially useful in regulated environments where traceability and auditability are critical.

Binary Deletion Vectors

To make row-level delete handling more space-efficient, V3 introduces binary deletion vectors. These can represent deletions compactly and are particularly beneficial when working with high-frequency updates or deletes.

Together, these features mark a significant upgrade in what Iceberg tables can represent and how efficiently they can perform under modern workloads.

Goals and Design Principles of V3

The enhancements in Iceberg format version 3 are not just incremental—they are guided by clear design principles aimed at solving real-world scalability, correctness, and flexibility challenges in data engineering.

Serializable Isolation

Iceberg continues to provide strong snapshot isolation, and V3 reinforces this guarantee. All reads are isolated from concurrent writes by using committed snapshots. Writers operate under optimistic concurrency, creating new snapshots that are atomically swapped in. This avoids the need for distributed locks and ensures consistent, repeatable reads.

Performance at Scale

Planning query operations should not become slower as the table grows. V3 emphasizes performance with design choices like:

  • Remote planning that remains O(1) rather than growing with the number of partitions or files
  • Efficient manifest and manifest list structures that avoid redundant metadata reads
  • Metadata designed to support cost-based optimization in query engines

Client-Side Job Planning

To avoid bottlenecks at the catalog or metadata service layer, Iceberg delegates most planning tasks to clients. This approach distributes computation and reduces latency in large-scale deployments.

Full Schema and Partition Evolution

V3 strengthens support for evolving table schemas and partition specs:

  • Safe add, drop, rename, and reorder operations
  • Support for evolution within nested fields
  • Partitioning that can evolve independently from physical layout

Dependable Types and Format Compatibility

Iceberg tables continue to rely on well-defined data types that behave consistently across supported file formats. This ensures compatibility whether data is stored in Parquet, Avro, or ORC.

Storage Separation

Partitioning decisions are made at the table configuration level, not encoded in the file layout. This enables flexible, predicate-based planning that is not tied to directory structure and supports evolving partitioning strategies over time.

These principles make Iceberg V3 not just a feature upgrade but an architectural refinement that reinforces Iceberg’s role in the modern data stack.

Real-World Implications and Use Cases

The capabilities introduced in Iceberg format version 3 aren’t just theoretical—they’re designed to meet the evolving needs of organizations managing diverse, large-scale data workloads. Here’s how some of these new features translate into practical advantages.

Semi-Structured and Complex Data

With support for the variant data type, Iceberg V3 becomes a more suitable choice for handling semi-structured formats such as JSON or mixed-schema logs. This is especially valuable in applications like:

  • Event-driven architectures
  • Data from APIs or NoSQL stores
  • IoT data feeds with inconsistent schemas

Location and Geospatial Analytics

Geospatial support (geometry, geography) allows teams working in logistics, mapping, and environmental analytics to store and query location-based data natively within Iceberg tables.

Simplified Schema Evolution

Default column values reduce the friction of evolving schemas. If you need to add a column with a non-null constraint, you no longer have to backfill the data manually—just specify a default value and continue writing.

Advanced Partitioning Strategies

Multi-argument transforms give data engineers more precise control over partitioning and sorting. This enables more efficient queries, especially in cases where business logic relies on compound keys or multi-column grouping.

Improved Storage Efficiency

Binary deletion vectors offer a more compact way to track deleted rows, reducing metadata overhead in high-churn datasets. This is beneficial for use cases like:

  • Change Data Capture (CDC)
  • Frequent soft deletes in transactional systems
  • Real-time data correction pipelines

Together, these capabilities enable organizations to extend Iceberg into new domains, reduce operational complexity, and improve both query performance and data quality.

Considerations for Adopting V3

While Iceberg format version 3 brings valuable features, adoption should be approached with careful planning. Here are a few considerations to keep in mind before upgrading or starting with V3.

Engine Compatibility

V3 is still under active development, and not all query engines support its full feature set yet. Before enabling V3 features in production, verify that your chosen engine—such as Spark, Flink, Trino, or Dremio—can read and write V3 tables. Some engines may still be limited to V1 or V2 functionality.

Gradual Migration

One of Iceberg’s strengths is backward compatibility. You can continue using V1 or V2 tables while selectively introducing V3 features where supported. This allows for a phased rollout without disrupting existing pipelines.

Operational Readiness

New features like default column values and variant types may require adjustments to data ingestion and validation logic. It’s important to test how your tools and ETL frameworks interact with these features.

Additionally, features such as binary deletion vectors and row lineage may impact how you design compaction strategies or audit processes. Ensure that your metadata tooling and monitoring systems are ready to accommodate these changes.

Metadata and Storage Management

V3 maintains the same atomic commit model and metadata structure principles as previous versions, but with added complexity. Be prepared to manage a larger variety of metadata files, especially if you make use of advanced data types or lineage tracking.

Taking the time to evaluate these aspects will help ensure a smooth transition to V3 and maximize its benefits in your data architecture.

Conclusion

Apache Iceberg format version 3 marks a pivotal step forward in the evolution of table formats for the data lakehouse. While V1 and V2 laid the groundwork for reliability, scalability, and transactional support, V3 brings a new level of flexibility and capability to address increasingly diverse and demanding workloads.

With support for complex data types, schema defaults, advanced partitioning, and row-level lineage, Iceberg V3 is well-positioned to meet the needs of data teams working across streaming, analytics, machine learning, and regulatory use cases. At the same time, it preserves Iceberg’s foundational principles of atomicity, compatibility, and openness.

As with any major change, adopting V3 requires careful consideration of engine support and operational implications. But for those ready to embrace it, V3 opens the door to richer data modeling and more efficient data processing.

To learn more, consult the official Iceberg specification and explore the project’s GitHub and documentation. Staying current with Iceberg’s format versions will help your data platform remain flexible, performant, and future-proof.

Pick up a free copy of Apache Iceberg: The Definitive Guide and an Early Release copy of Apache Polaris: The Defintivie Guide.

Sign up for AI Ready Data content

Discover How Iceberg V3 Accelerates AI and Analytics with Unified, AI-Ready Data Products

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.