23 minute read · March 26, 2025
Disaster Recovery for Apache Iceberg Tables – Restoring from Backup and Getting Back Online

· Head of DevRel, Dremio

Disaster recovery isn’t just for databases and monolithic data warehouses anymore. In the world of modern data lakehouses—where Apache Iceberg is increasingly used to power large-scale, multi-engine analytics—knowing how to recover from failure is just as important as knowing how to scale.
Imagine this: your data platform experiences a serious outage. Maybe a cloud region goes offline. Maybe a migration goes wrong. Maybe a human error wipes out part of your data lake. You’ve got a backup of your files, and you restore them to storage. But when you go to query your Iceberg tables… nothing works.
What happened?
The truth is, restoring Iceberg tables from a filesystem backup isn’t as simple as putting files back in place. There are two critical questions you need to ask:
- Does your catalog metadata still reference the correct
metadata.json
file? - Do your metadata files still reference valid data and manifest file paths, especially since Iceberg uses absolute paths?
If the answer to either is “no,” your tables won’t function properly—even if all your data is physically present.
The good news: Apache Iceberg provides tools for precisely this situation. Spark procedures like register_table
and rewrite_table_path
give you the ability to bring your tables back to life by realigning the catalog and metadata with the restored files.
In this blog, we’ll walk through:
- How Iceberg stores metadata and why that matters for recovery
- Common recovery scenarios and how to handle them
- How to use built-in Spark procedures to fix broken references
- Best practices for disaster-proofing your Iceberg tables in the future
If your backup strategy includes Iceberg tables—or should—this is the guide to keep handy.
Common Recovery Scenarios After a Filesystem Restore
Let’s say you’ve restored your Iceberg table files—metadata, manifests, and data—from a backup. That’s a good start. But unless your environment is exactly the same as it was pre-disaster, chances are, something is out of sync.
Let’s look at the two most common recovery scenarios and how to fix them using Iceberg’s built-in Spark procedures.
Scenario 1: The Catalog Is Missing or Points to the Wrong metadata.json
In this scenario, all your table files are present, but the catalog doesn’t point to the correct snapshot of the table. You might be dealing with:
- A catalog entry pointing to an newer
metadata.json
from after your last backup - A table that was never registered in the current catalog environment
This is where Iceberg’s register_table
procedure comes in.
✅ How to Fix It: Register the Latest Metadata File
You’ll need to:
- Locate the most recent
metadata.json
file in the restored table’smetadata/
directory. - Use the
register_table
procedure to create a new catalog entry pointing to that file.
CALL spark_catalog.system.register_table( table => 'db.restored_table', metadata_file => 's3a://restored-bucket/db/restored_table/metadata/v17.metadata.json' );
🔐 Important:
- Only use
register_table
if the table is not already registered or the current reference is incorrect, or if you’re intentionally moving it to a new catalog. - Registering the same metadata in multiple catalogs can lead to corruption or lost updates.
File Paths Have Changed During Restore
This scenario is a bit trickier. Let’s say you originally stored your data in HDFS, but you restored it to S3 during recovery. Or maybe you moved the files to a different bucket, base directory, or region. In either case, your metadata files still reference the old paths.
Because Iceberg uses absolute file paths for all its references—data files, delete files, manifests, even snapshot lineage—these path mismatches break everything.
✅ How to Fix It: Rewrite Metadata Paths with rewrite_table_path
Iceberg provides the rewrite_table_path
procedure to safely rewrite all metadata references from one path prefix to another.
CALL spark_catalog.system.rewrite_table_path( table => 'db.my_table', source_prefix => 'hdfs://nn:8020/warehouse/my_table', target_prefix => 's3a://my-bucket/warehouse/my_table' );
This procedure:
- Rewrites the
metadata.json
, manifests, and manifest lists with the new prefix - Outputs the location of a CSV mapping file, showing which files need to be physically copied to the new location
After this, you’ll still need to:
- Use a tool like
Distcp
or AWS CLI to actually copy the data files based on the mapping - Optionally, use
register_table
if the table has not yet been registered in the catalog
🛠 Optional: Incremental Rewrites
If you only need to update a subset of metadata versions—say, from v2.metadata.json
to v20.metadata.json
—you can specify a version range and a custom staging location:
CALL spark_catalog.system.rewrite_table_path( table => 'db.my_table', source_prefix => 's3a://old-bucket/db.my_table', target_prefix => 's3a://new-bucket/db.my_table', start_version => 'v2.metadata.json', end_version => 'v20.metadata.json', staging_location => 's3a://staging-bucket/db.my_table' );
This provides more control and limits the scope of changes—ideal for partial restores or migrations.
Deep Dive – Spark Procedures for Iceberg Recovery
Once you’ve diagnosed the issue—whether it’s a missing catalog entry or outdated file paths—you’ll need to get hands-on with the Spark procedures Iceberg provides for recovery. These aren't just utility functions—they’re purpose-built tools that give you surgical control over table state.
Let’s examine each procedure in more detail, including what it does, how to use it, and where to be careful.
register_table
: Reconnect the Catalog to Your Table
The register_table
procedure allows you to create a new catalog entry for an Iceberg table, pointing to an existing metadata.json
file. This is useful when:
- The catalog was lost, corrupted, or never existed in the restored environment
- You’ve manually restored metadata and need to reintroduce the table to the catalog
- You’re moving a table between catalogs (e.g., from Hive Metastore to AWS Glue)
Usage
CALL spark_catalog.system.register_table( table => 'db.restored_table', metadata_file => 's3a://my-bucket/db/restored_table/metadata/v17.metadata.json' );
Arguments
Name | Required | Type | Description |
---|---|---|---|
table | ✔️ | string | The table identifier to register in the catalog |
metadata_file | ✔️ | string | Full path to the Iceberg metadata.json to register |
Output
Output Name | Type | Description |
---|---|---|
current_snapshot_id | long | ID of the current snapshot in the table |
total_records_count | long | Total record count in the restored table |
total_data_files_count | long | Number of data files in the restored table |
⚠️ Caution
Avoid using this procedure if the same table is still registered in another catalog. Having the same table registered in multiple catalogs can lead to:
- Conflicting updates
- Lost snapshots
- Corrupted metadata
Always de-register a table from one catalog before re-registering it in another.
rewrite_table_path
: Update File Paths in Metadata
The rewrite_table_path
procedure is used when the physical file locations have changed, and you need to update all metadata references accordingly.
This could be due to:
- Restoring from HDFS to S3
- Moving to a new cloud bucket
- Renaming directory prefixes during recovery
Usage
CALL spark_catalog.system.rewrite_table_path( table => 'db.my_table', source_prefix => 'hdfs://nn:8020/warehouse/db.my_table', target_prefix => 's3a://new-bucket/warehouse/db.my_table' );
Arguments
Name | Required | Default | Type | Description |
---|---|---|---|---|
table | ✔️ | string | Name of the Iceberg table | |
source_prefix | ✔️ | string | Prefix to be replaced in absolute file paths | |
target_prefix | ✔️ | string | Replacement prefix for the updated file paths | |
start_version | ❌ | First | string | First metadata.json to include in rewrite |
end_version | ❌ | Latest | string | Last metadata.json to include in rewrite |
staging_location | ❌ | New dir | string | Directory to write new metadata files to |
Output
Output Name | Type | Description |
---|---|---|
latest_version | string | Name of the last metadata file rewritten |
file_list_location | string | Path to CSV with source-to-target file copy plan |
Modes of Operation
- Full Rewrite: Rewrites all reachable metadata files (default).
- Incremental Rewrite: Set
start_version
andend_version
to limit the scope.
What Happens After Rewrite?
- New metadata files are written to the
staging_location
- You’ll receive a CSV mapping of which data files need to be moved or copied
- You’re responsible for copying those files (e.g., with
Distcp
,aws s3 cp
, etc.) - Once in place, you can use
register_table
to reconnect the rewritten metadata to the catalog
Putting It All Together
In a real disaster recovery situation, you might use both procedures in sequence:
- Use
rewrite_table_path
to update all absolute paths in metadata - Physically move or sync the files to the new location
- Use
register_table
to register the updated table in your target catalog
Let's explore best practices for backing up and restoring Iceberg tables and how to ensure a smooth, stress-free recovery.
Best Practices for Backing Up and Restoring Iceberg Tables
Apache Iceberg’s architecture gives you fine-grained control over data recovery, but with that control comes responsibility. If you want disaster recovery to be painless (or at least manageable), it's critical to design your backup strategy and recovery plan around the unique way Iceberg handles metadata and file paths.
Here are the top best practices to follow before and after a restore.
1. Always Backup Metadata and Data Together
Iceberg separates metadata from data, but you need both for recovery.
- Metadata includes the
metadata/
directory (containingv1.metadata.json
,v2.metadata.json
, etc.), manifest files, and manifest lists. - Data includes all your Parquet, ORC, or Avro files referenced by those manifests.
If you’re using tools like Distcp
, rsync
, or object storage replication, make sure both metadata and data directories are included in the same snapshot or backup window.
⚠️ Backing up only data files without corresponding metadata makes restoration nearly impossible.
2. Track the Latest metadata.json
in Every Backup
When backing up an Iceberg table, always record the path to the latest metadata.json
file. This file acts as the entry point to the entire table state.
Why this matters:
- If the catalog entry is lost, you'll need this file to re-register the table using
register_table
. - If you restore multiple tables, it can be hard to determine which snapshot was the most recent without a tracking log.
Create a backup log or manifest for each table with entries like:
table_name: customer_events latest_metadata: s3a://backups/iceberg/customer_events/metadata/v57.metadata.json timestamp: 2025-03-24T04:00:00Z
3. Check for File Path Changes Before Recovery
One of the most common issues after a restore is that metadata files contain absolute paths that no longer match your storage layout. For example:
- You restored from
hdfs://
tos3a://
- You renamed the top-level directory or changed bucket prefixes
- You used a new storage mount or alias
Before re-registering a table, inspect the metadata and manifest files to confirm whether the file paths match their new locations. If not, use rewrite_table_path
to adjust them before registering.
4. Automate Validation Post-Restore
After restoring and registering a table, don’t assume everything works—validate it.
Things to check:
- Can you run a simple
SELECT COUNT(*)
? - Does the snapshot history include what you expect?
- Are partition pruning and data skipping functioning normally?
- Do
DESCRIBE HISTORY
andDESCRIBE DETAIL
return expected results?
You can automate these checks as part of your recovery workflow using Spark or Trino queries to catch misconfigurations early.
5. Dry-Run Your Recovery Plan
Don’t wait until a real disaster to test your recovery workflow.
- Restore a test table from a known-good backup to a staging catalog
- Practice using
rewrite_table_path
andregister_table
- Measure how long the recovery takes and what tools you need
- Document the entire process in an internal runbook
Even a few dry runs will make a difference when you're under pressure in an actual outage.
Conclusion
In data lakehouses, disaster recovery is no longer just an afterthought. As Apache Iceberg powers more mission-critical workloads across cloud and hybrid environments, your ability to recover tables reliably becomes as important as your ability to query them quickly.
Unlike traditional databases, Iceberg doesn’t bundle storage, metadata, and catalog into a single system. Instead, it gives you flexibility—with the tradeoff that restoring from a backup requires understanding how those components fit together:
- If your catalog is missing or misaligned, use
register_table
to re-establish the link to your latestmetadata.json
. - If your file paths have changed, use
rewrite_table_path
to rewrite metadata and create a clean recovery copy. - And if you want to avoid firefighting in the future, implement proactive backup practices, track your metadata versions, and validate your recovery process in advance.
The truth is, Iceberg gives you everything you need to be resilient—but it doesn't hold your hand. That’s a strength, not a weakness, for teams ready to take ownership of their data systems.
Recovery doesn’t have to be a high-stakes scramble. It can be a smooth, auditable, and even routine process with a thoughtful plan, the right tools, and a few good habits.
So the next time disaster strikes your data lake—whether it’s a misconfigured job, a storage failure, or a region outage—you won’t be scrambling. You’ll be executing.
Sign up for AI Ready Data content