29 minute read · March 26, 2025
Demystifying Apache Iceberg Table Services – What They Are and Why They Matter

· Head of DevRel, Dremio

Learn more about Apache Polaris by downloading a free early release copy of Apache Polaris: The Definitive Guide along with learning about Dremio's Enterprise Catalog powered by Apache Polaris.
Apache Iceberg has quickly emerged as one of the most critical technologies in the modern data lakehouse stack. Originally designed to solve the shortcomings of Hive table formats, Iceberg brings ACID transactions, schema evolution, time travel, and hidden partitioning to data lakes—features that were once the domain of traditional data warehouses. But as the project matures, it’s becoming much more than just a table format.
Today, the Iceberg ecosystem is evolving across three major fronts:
- The Table Spec – This defines the structure and metadata of each table, including schemas, partitioning, and snapshot lineage.
- The REST Catalog Spec – A standardized protocol for how clients and compute engines can interact with catalogs that track Iceberg tables.
- Table Services – A growing set of operations and tools designed to manage, optimize, and maintain the performance and reliability of Iceberg tables over time.
The first two areas—table specs and REST catalogs—are relatively well understood and increasingly supported across engines like Spark, Trino, Dremio, and Flink. However, Table Services are still emerging and often less familiar to data teams. Yet, they are arguably the most critical for keeping Iceberg tables production-ready at scale.
We’ll explore the world of Iceberg Table Services: what they are, why they matter, how vendors are implementing them, and what the future may hold. Whether managing your Iceberg tables or evaluating managed offerings, understanding Table Services is key to running fast, cost-efficient, and clean data lakehouse systems.
Iceberg’s Three Pillars of Evolution
To understand the role of Table Services in Apache Iceberg, it helps to zoom out and look at how the project is growing. Iceberg isn’t just a table format anymore—it's a platform. Its ecosystem is developing along three major dimensions, each of which addresses a different layer of managing data in a lakehouse architecture:
The Table Specification: The Foundation
At the core of Iceberg is its table specification—a detailed definition of how tables are structured and how they behave. This spec governs critical metadata such as:
- Table schemas and partitioning strategies
- Snapshot history and versioning
- Manifest and metadata file formats
- How data files are tracked and discovered
The table spec's brilliance is its engine-agnostic design. It allows Iceberg tables to be accessed consistently across various engines—Spark, Trino, Flink, Dremio, and others—without locking users into a specific compute layer. When someone says, “Iceberg is just a table format,” they’re mostly referring to this spec.
The REST Catalog Specification: Decoupling Metadata Management
While the table spec defines the layout of a single table, the catalog spec determines how clients connect to catalogs to discover and manage many tables. Iceberg’s REST Catalog Specification provides a standard interface for compute engines to interact with catalog services that register, update, and list tables.
This REST-based interface allows for:
- Enabling an open standard for connecting to Iceberg Catalogs
- Allow engines to be agnostic to the users catalog or the language the catalog was written in
For example, AWS Glue, Apache Polaris (incubating) and Project Nessie implement this spec to enable cataloging services that can be shared across teams and tools.
This is a major step toward composable data architectures, where storage, metadata, and compute can evolve independently.
Table Services: Operational Excellence for Data Tables
Finally, we come to the third and most emerging area: Table Services.
Where the table spec defines what a table is, and the rest catalog spec defines how it’s tracked, table services are how that table is maintained over time. These operational procedures keep your Iceberg tables performant, clean, and ready for production workloads.
Think of them as Iceberg tables' garbage collection, optimization, and lifecycle management systems. They’re not required to read or write to a table, but without them, tables can degrade in performance and cost over time.
Today, these services are primarily focused on storage optimization and cleanup.
The Iceberg Vendor Landscape – Who’s Doing What?
As Apache Iceberg matures, not just the open-source spec is evolving—it’s also the ecosystem of tools and vendors building around it. What started as a low-level table format is now at the center of a fast-growing market of lakehouse solutions. While the open-source project continues to define the specs, many vendors are building differentiated services around how those specs are used in production.
To make sense of this landscape, it’s helpful to break down vendor offerings into three categories:
Table Processing: Reading and Writing Iceberg Tables
This is the most fundamental layer—read and write operations. Nearly every data platform that touches Iceberg starts here. These services handle:
- Ingesting data into Iceberg tables (batch or streaming)
- Reading Iceberg tables through compute engines
Most engines—Apache Spark, Apache Flink, Dremio, Trino, Presto—support native or near-native Iceberg reads and writes.
Catalog Services: Tracking and Governing Iceberg Tables
Once you have Iceberg tables, you need a system to track them—especially across many teams, regions, and projects. That’s where catalog services come in.
Catalogs keep track of:
- What tables exist
- Their current metadata locations
- Access controls and governance rules
Most modern Iceber catalogs are centralized services that adhere to the REST Catalog Spec.
Examples of catalog services include:
- AWS Glue: A managed catalog in the AWS Platform
- Apache Polaris: An Open Source Catalog driven by a community that includes Snowflake, Dremio, Microsoft, Google and AWS
- Project Nessie: An open source Git-like catalog for versioning data tables
- Dremio Catalog: A managed Polaris catalog solution built into the Dremio Platform
- Open Catalog: A managed Polaris catalog solution built into the Snowflake Platform
These catalogs serve as the metadata control plane in a modern lakehouse—decoupled from compute, sharable across users, and increasingly integrated with governance and security tooling.
Table Services: Managing Performance and Table Lifecycle
Now, we get to the third and most under-appreciated layer: Table Services.
This is where vendors go beyond table format and metadata tracking, and start offering services that actively manage your Iceberg tables over time. These services typically include:
- Optimization: Rewriting small files into larger ones, co-locating data for better scan performance, applying compression and tiering policies.
- Cleanup: Expiring old snapshots, pruning orphaned files, and managing metadata size.
- Lifecycle Management: Automating recurring tasks like compaction, retention enforcement, and even failure recovery.
Not all vendors offer Table Services yet—but it’s a growing area of focus.
For example:
- Dremio Catalog automatically compacts and optimizes tables behind the scenes.
- AWS S3 Table are starting to expose snapshot cleanup and compaction options.
- Other cloud-native tools are building these as managed, policy-driven services.
Why This Matters
Understanding which vendors support which parts of the Iceberg stack helps data teams plan their architectures. Some teams want full control and will run open-source catalogs with manual optimizations. Others want a managed experience where table health is taken care of automatically.
As the ecosystem grows, more vendors are moving toward offering all three layers as integrated services. But there’s still a lot of variation—and innovation—happening in each.
What Are Iceberg Table Services?
By now, most data engineers working with Apache Iceberg are familiar with its role as a table format and its ability to power multi-engine analytics. Concepts like hidden partitioning, snapshot-based time travel, and schema evolution are well understood. Catalogs, too, are gaining traction as essential infrastructure for governing tables in production.
But Table Services? That’s where things get a little more obscure.
Let’s define them.
Table Services Are the Operational Backbone of Iceberg
At a high level, Table Services are tools and processes that maintain and optimize Iceberg tables over time. They aren’t strictly necessary to read from or write to an Iceberg table. You can run an Iceberg-powered pipeline for quite a while without them.
But eventually, you’ll start to notice a few things:
- Query performance degrades as file counts explode
- Storage costs increase as orphaned data piles up
- Metadata files get large and unwieldy
- Snapshots accumulate beyond your retention window
That’s where Table Services come in.
These services don’t define what an Iceberg table is—that’s the job of the table spec. And they don’t track where tables are or who owns them—that’s for the catalog.
Instead, Table Services manage the lifecycle of Iceberg tables to ensure they remain performant, cost-efficient, and production-grade.
Table Services vs. Table Format vs. Catalog
To make this clearer, think of the three layers like this:
Layer | Purpose | Primary Focus |
---|---|---|
Table Format | Defines how the table stores data and metadata | Structure and access |
Catalog | Tracks table locations and metadata references | Governance and discoverability |
Table Services | Maintains and optimizes table health over time | Performance, cost, reliability |
Without table services, the other two layers will still work but slowly degrade. Data lakes that lack lifecycle management quickly become data swamps.
Why Table Services Matter (Even If They’re Invisible)
In most modern data platforms—whether data warehouses or lakehouses—there’s an entire layer of operations that runs behind the scenes to keep tables fast and lean. This includes:
- Compaction of small files
- Intelligent file layout and data clustering
- Metadata pruning
- Retention enforcement
In traditional databases, these tasks are handled by background processes like autovacuum in PostgreSQL or auto-optimize in Snowflake. In Iceberg, these responsibilities are increasingly formalized as Table Services.
And unlike traditional systems, Table Services in Iceberg are designed to be:
- Engine-agnostic: they can be triggered by jobs running in Spark, Flink, or even REST-based orchestration tools
- Composable: they can be scheduled, automated, and monitored independently of the compute layer
- Pluggable: vendors can build proprietary or optimized implementations while still adhering to core Iceberg principles
Table Services currently center around optimization and cleanup, but the roadmap goes far beyond that—into territory like disaster recovery, transactional integrity, and even custom metadata.
Core Table Services Today
While the vision for Table Services is expanding, today’s implementations focus on two core areas that directly impact performance and cost: Optimization and Cleanup. These services handle the behind-the-scenes work that keeps Iceberg tables efficient, scalable, and manageable in production environments.
Let’s break each one down.
Optimization: Managing Storage for Performance and Cost
One key challenge in data lake environments is managing how data is physically laid out on disk. As data arrives in micro-batches or streaming inserts, it's often written as many small files. Over time, this leads to a phenomenon known as the small files problem, which can wreak havoc on query performance and increase storage overhead.
Iceberg’s optimization services tackle this problem head-on.
What Optimization Services Do:
- Rewrite small files into fewer, larger files for efficient scanning
- Cluster related data together (e.g., based on partition or sort keys) to enable better pruning and caching
- Apply compression and encryption policies to ensure consistency and security
- Support storage tiering, moving cold data to cheaper storage layers while keeping hot data fast and accessible
These tasks can be executed via compaction jobs—typically orchestrated by Spark, Flink, or Dremio’s built-in services.
Why It Matters:
Without optimization, query engines must scan thousands (or millions) of small files, each triggering metadata lookups, open/close operations, and increased shuffle during execution. This can turn even a simple filter query into a costly operation.
Optimized tables, by contrast:
- Have fewer I/O operations
- Enable faster scan planning and execution
- Reduce compute and storage costs
- Are better prepared for downstream analytics at scale
Example:
Imagine a streaming pipeline that writes event logs into an Iceberg table every 5 seconds. After a day, you’ve got 17,000+ files. Query performance plummets. An optimization job starts overnight, rewriting these into ~100 well-clustered files. Suddenly, your BI dashboards are responsive again—and your cloud storage bill is a little lower too.
Cleanup: Enforcing Retention and Removing Metadata Bloat
Iceberg is designed for versioned data: every write creates a new snapshot, and each snapshot includes metadata files, manifest lists, and references to data files. This is powerful for time travel and rollback—but if left unmanaged, it creates metadata sprawl.
That’s where cleanup services come in.
What Cleanup Services Do:
- Expire old snapshots based on retention policies (e.g., keep last 7 days of history)
- Remove orphaned data files that are no longer referenced by any snapshot
- Prune outdated metadata files, including manifests and manifest lists
- Optionally enforce data deletion policies, helping meet compliance or regulatory needs
These services are often configurable and schedulable, allowing teams to balance retention needs with performance.
Why It Matters:
Without cleanup, Iceberg tables accumulate:
- Unused data files that take up space
- Metadata that slows down planning and execution
- Snapshots that make rollback more confusing and less reliable
Over time, this can cause bloated storage, increased costs, and degraded query performance—even if the table “looks” fine from the outside.
Example:
A data team wants to preserve only 14 days of time travel on a customer analytics table. They set a snapshot expiration policy and run a daily cleanup job. This keeps the table lean, while still supporting rollback in case of late-arriving data or pipeline errors. When someone mistakenly rewrites a partition, it’s easy to recover—but stale snapshots don’t pile up forever.
When and How These Services Run
Table Services can be:
- Manual: Run via Spark jobs, CLI tools, or notebooks
- Scheduled: Triggered by orchestration tools like Airflow, Dagster, or native platform schedulers
- Managed: Handled entirely by platforms like Dremio, AWS and others without user intervention
Some platforms offer policy-based automation (e.g., “optimize partitions every 24 hours”), while others expose lower-level APIs for precise control.
The Future of Table Services
While today’s Table Services focus primarily on optimization and cleanup, the future of Iceberg table management is already taking shape. As data lakehouses scale to support critical business operations, Table Services will evolve from basic maintenance tools to full-fledged resilience and lifecycle management layers.
Below are some of the capabilities we can expect—or already see early signs of—in the next generation of Iceberg Table Services.
Disaster Recovery: Making Recovery Reliable and Metadata-Aware
Data loss can happen for many reasons: a cloud bucket is accidentally wiped, a zone goes down, or a disaster recovery (DR) event triggers failover to a backup region. Recovering from such events in a traditional data lake setup is often manual and brittle, especially when metadata paths and catalog entries become stale or broken.
Future Table Services will aim to make disaster recovery more automated and Iceberg-aware.
Potential Capabilities:
- Scanning for data files and automatically rebuilding metadata manifests
- Updating catalog references when file paths change post-recovery
- Validating that snapshot and data file consistency is preserved
- Supporting metadata backups and version-aware restores
These services would bring Iceberg closer to traditional RDBMS-style recoverability—without sacrificing lakehouse flexibility.
Transaction Rollback: Multi-Table Consistency and Failure Recovery
Apache Iceberg's snapshot-based design supports atomic operations on individual tables. However, many real-world operations affect multiple tables simultaneously—think of a data pipeline that joins data from one table and writes outputs to several others.
If that pipeline fails partway through, you’re left in an inconsistent state: one table updated, others not. Rolling back these changes is complex and typically manual.
What Table Services Could Offer:
- Grouping snapshots across multiple tables into a logical transaction
- Capturing lineage between dependent table operations
- Providing rollback mechanisms that revert all affected tables to a consistent state
- Making multi-table operations auditable and traceable
This would enable more robust data pipelines with stronger correctness guarantees—a game-changer for regulated industries or critical analytics workflows.
Supplemental Metadata: Managing Information Outside the Table Spec
Right now, the Iceberg table spec is deliberately scoped: it governs technical metadata needed to read and write data. But in many organizations, teams want to store additional, human- or process-centric metadata alongside their tables.
Examples include:
- Table documentation (e.g., business definitions, intended use cases)
- Tags or labels for data classification (e.g., PII, GDPR-covered, internal-only)
- Operational metadata like SLA information, owners, or lineage links
This raises a natural question: Should this live in the catalog or as part of Table Services?
Some vendors may choose to store this supplemental metadata within catalogs. Others might offer Table Services APIs to attach, retrieve, or manage it decoupled.
Either way, the key insight is this: metadata goes beyond schema and snapshots. And managing it well is essential for compliance, collaboration, and clarity.
Why These Future Services Matter
All of these features—disaster recovery, transaction rollback, supplemental metadata—are about trust. When your data platform is handling petabytes of mission-critical data, you need to trust that it can:
- Recover gracefully from failure
- Maintain consistency across complex operations
- Provide clear, contextual metadata that teams can rely on
The next wave of Table Services in Apache Iceberg will aim to deliver just that.
And as more vendors compete to deliver “production-ready” Iceberg experiences, expect these features to become differentiators—especially in cloud-native and enterprise-grade platforms.
Conclusion: Why Table Services Are the Next Big Thing
Apache Iceberg started as a better table format—solving the deep-rooted limitations of Hive-style tables. But it’s grown into something much more powerful: the foundation of a modern, open data lakehouse architecture.
While the table spec and catalog spec laid the groundwork for interoperability and governance, it’s Table Services that will determine whether your Iceberg tables thrive or degrade in the real world. They’re the unseen engine room that keeps data performant, cost-effective, and reliable—especially at scale.
Today, we’re already seeing Table Services handle:
- Optimization, to address file layout, compression, and tiering
- Cleanup, to manage snapshot sprawl and metadata bloat
But tomorrow, they’ll play an even bigger role:
- Disaster recovery, ensuring tables can survive and recover from unexpected failure
- Multi-table rollback, allowing complex operations to be atomic and reversible
- Metadata enrichment, bridging the gap between raw data and meaningful context
As this space grows, so does the vendor ecosystem. Some platforms offer a full-stack experience, handling cataloging, table I/O, and lifecycle services in one place. Others give users the flexibility to plug and play based on specific needs.
For data teams, understanding the role of Table Services isn’t just a technical detail—it’s a strategic choice. It’s about deciding how you want to run your data platform: reactively cleaning up when things go wrong, or proactively building a system that takes care of itself.
As the Iceberg ecosystem matures, Table Services quietly become the key to long-term success. They’re not the most visible part of your stack—but they might be the most important.
Get Hands-on and Try Out an Apache Iceberg Lakehouse on your Laptop Today
Sign up for AI Ready Data content