23 minute read · March 26, 2025

Disaster Recovery for Apache Iceberg Tables – Restoring from Backup and Getting Back Online

Alex Merced

Alex Merced · Head of DevRel, Dremio

Disaster recovery isn’t just for databases and monolithic data warehouses anymore. In the world of modern data lakehouses—where Apache Iceberg is increasingly used to power large-scale, multi-engine analytics—knowing how to recover from failure is just as important as knowing how to scale.

Imagine this: your data platform experiences a serious outage. Maybe a cloud region goes offline. Maybe a migration goes wrong. Maybe a human error wipes out part of your data lake. You’ve got a backup of your files, and you restore them to storage. But when you go to query your Iceberg tables… nothing works.

What happened?

The truth is, restoring Iceberg tables from a filesystem backup isn’t as simple as putting files back in place. There are two critical questions you need to ask:

  1. Does your catalog metadata still reference the correct metadata.json file?
  2. Do your metadata files still reference valid data and manifest file paths, especially since Iceberg uses absolute paths?

If the answer to either is “no,” your tables won’t function properly—even if all your data is physically present.

The good news: Apache Iceberg provides tools for precisely this situation. Spark procedures like register_table and rewrite_table_path give you the ability to bring your tables back to life by realigning the catalog and metadata with the restored files.

In this blog, we’ll walk through:

  • How Iceberg stores metadata and why that matters for recovery
  • Common recovery scenarios and how to handle them
  • How to use built-in Spark procedures to fix broken references
  • Best practices for disaster-proofing your Iceberg tables in the future

If your backup strategy includes Iceberg tables—or should—this is the guide to keep handy.

Common Recovery Scenarios After a Filesystem Restore

Let’s say you’ve restored your Iceberg table files—metadata, manifests, and data—from a backup. That’s a good start. But unless your environment is exactly the same as it was pre-disaster, chances are, something is out of sync.

Let’s look at the two most common recovery scenarios and how to fix them using Iceberg’s built-in Spark procedures.

Scenario 1: The Catalog Is Missing or Points to the Wrong metadata.json

In this scenario, all your table files are present, but the catalog doesn’t point to the correct snapshot of the table. You might be dealing with:

  • A catalog entry pointing to an newer metadata.json from after your last backup
  • A table that was never registered in the current catalog environment

This is where Iceberg’s register_table procedure comes in.

✅ How to Fix It: Register the Latest Metadata File

You’ll need to:

  1. Locate the most recent metadata.json file in the restored table’s metadata/ directory.
  2. Use the register_table procedure to create a new catalog entry pointing to that file.
CALL spark_catalog.system.register_table(
  table => 'db.restored_table',
  metadata_file => 's3a://restored-bucket/db/restored_table/metadata/v17.metadata.json'
);

🔐 Important:

  • Only use register_table if the table is not already registered or the current reference is incorrect, or if you’re intentionally moving it to a new catalog.
  • Registering the same metadata in multiple catalogs can lead to corruption or lost updates.

File Paths Have Changed During Restore

This scenario is a bit trickier. Let’s say you originally stored your data in HDFS, but you restored it to S3 during recovery. Or maybe you moved the files to a different bucket, base directory, or region. In either case, your metadata files still reference the old paths.

Because Iceberg uses absolute file paths for all its references—data files, delete files, manifests, even snapshot lineage—these path mismatches break everything.

✅ How to Fix It: Rewrite Metadata Paths with rewrite_table_path

Iceberg provides the rewrite_table_path procedure to safely rewrite all metadata references from one path prefix to another.

CALL spark_catalog.system.rewrite_table_path(
    table => 'db.my_table',
    source_prefix => 'hdfs://nn:8020/warehouse/my_table',
    target_prefix => 's3a://my-bucket/warehouse/my_table'
);

This procedure:

  • Rewrites the metadata.json, manifests, and manifest lists with the new prefix
  • Outputs the location of a CSV mapping file, showing which files need to be physically copied to the new location

After this, you’ll still need to:

  1. Use a tool like Distcp or AWS CLI to actually copy the data files based on the mapping
  2. Optionally, use register_table if the table has not yet been registered in the catalog

🛠 Optional: Incremental Rewrites

If you only need to update a subset of metadata versions—say, from v2.metadata.json to v20.metadata.json—you can specify a version range and a custom staging location:

CALL spark_catalog.system.rewrite_table_path(
    table => 'db.my_table',
    source_prefix => 's3a://old-bucket/db.my_table',
    target_prefix => 's3a://new-bucket/db.my_table',
    start_version => 'v2.metadata.json',
    end_version => 'v20.metadata.json',
    staging_location => 's3a://staging-bucket/db.my_table'
);

This provides more control and limits the scope of changes—ideal for partial restores or migrations.

Deep Dive – Spark Procedures for Iceberg Recovery

Once you’ve diagnosed the issue—whether it’s a missing catalog entry or outdated file paths—you’ll need to get hands-on with the Spark procedures Iceberg provides for recovery. These aren't just utility functions—they’re purpose-built tools that give you surgical control over table state.

Let’s examine each procedure in more detail, including what it does, how to use it, and where to be careful.

register_table: Reconnect the Catalog to Your Table

The register_table procedure allows you to create a new catalog entry for an Iceberg table, pointing to an existing metadata.json file. This is useful when:

  • The catalog was lost, corrupted, or never existed in the restored environment
  • You’ve manually restored metadata and need to reintroduce the table to the catalog
  • You’re moving a table between catalogs (e.g., from Hive Metastore to AWS Glue)

Usage

CALL spark_catalog.system.register_table(
  table => 'db.restored_table',
  metadata_file => 's3a://my-bucket/db/restored_table/metadata/v17.metadata.json'
);

Arguments

NameRequiredTypeDescription
table✔️stringThe table identifier to register in the catalog
metadata_file✔️stringFull path to the Iceberg metadata.json to register

Output

Output NameTypeDescription
current_snapshot_idlongID of the current snapshot in the table
total_records_countlongTotal record count in the restored table
total_data_files_countlongNumber of data files in the restored table

⚠️ Caution

Avoid using this procedure if the same table is still registered in another catalog. Having the same table registered in multiple catalogs can lead to:

  • Conflicting updates
  • Lost snapshots
  • Corrupted metadata

Always de-register a table from one catalog before re-registering it in another.

rewrite_table_path: Update File Paths in Metadata

The rewrite_table_path procedure is used when the physical file locations have changed, and you need to update all metadata references accordingly.

This could be due to:

  • Restoring from HDFS to S3
  • Moving to a new cloud bucket
  • Renaming directory prefixes during recovery

Usage

CALL spark_catalog.system.rewrite_table_path(
    table => 'db.my_table',
    source_prefix => 'hdfs://nn:8020/warehouse/db.my_table',
    target_prefix => 's3a://new-bucket/warehouse/db.my_table'
);

Arguments

NameRequiredDefaultTypeDescription
table✔️stringName of the Iceberg table
source_prefix✔️stringPrefix to be replaced in absolute file paths
target_prefix✔️stringReplacement prefix for the updated file paths
start_versionFirststringFirst metadata.json to include in rewrite
end_versionLateststringLast metadata.json to include in rewrite
staging_locationNew dirstringDirectory to write new metadata files to

Output

Output NameTypeDescription
latest_versionstringName of the last metadata file rewritten
file_list_locationstringPath to CSV with source-to-target file copy plan

Modes of Operation

  • Full Rewrite: Rewrites all reachable metadata files (default).
  • Incremental Rewrite: Set start_version and end_version to limit the scope.

What Happens After Rewrite?

  • New metadata files are written to the staging_location
  • You’ll receive a CSV mapping of which data files need to be moved or copied
  • You’re responsible for copying those files (e.g., with Distcp, aws s3 cp, etc.)
  • Once in place, you can use register_table to reconnect the rewritten metadata to the catalog

Putting It All Together

In a real disaster recovery situation, you might use both procedures in sequence:

  1. Use rewrite_table_path to update all absolute paths in metadata
  2. Physically move or sync the files to the new location
  3. Use register_table to register the updated table in your target catalog

Let's explore best practices for backing up and restoring Iceberg tables and how to ensure a smooth, stress-free recovery.

Best Practices for Backing Up and Restoring Iceberg Tables

Apache Iceberg’s architecture gives you fine-grained control over data recovery, but with that control comes responsibility. If you want disaster recovery to be painless (or at least manageable), it's critical to design your backup strategy and recovery plan around the unique way Iceberg handles metadata and file paths.

Here are the top best practices to follow before and after a restore.

1. Always Backup Metadata and Data Together

Iceberg separates metadata from data, but you need both for recovery.

  • Metadata includes the metadata/ directory (containing v1.metadata.json, v2.metadata.json, etc.), manifest files, and manifest lists.
  • Data includes all your Parquet, ORC, or Avro files referenced by those manifests.

If you’re using tools like Distcp, rsync, or object storage replication, make sure both metadata and data directories are included in the same snapshot or backup window.

⚠️ Backing up only data files without corresponding metadata makes restoration nearly impossible.

2. Track the Latest metadata.json in Every Backup

When backing up an Iceberg table, always record the path to the latest metadata.json file. This file acts as the entry point to the entire table state.

Why this matters:

  • If the catalog entry is lost, you'll need this file to re-register the table using register_table.
  • If you restore multiple tables, it can be hard to determine which snapshot was the most recent without a tracking log.

Create a backup log or manifest for each table with entries like:

table_name: customer_events
latest_metadata: s3a://backups/iceberg/customer_events/metadata/v57.metadata.json
timestamp: 2025-03-24T04:00:00Z

3. Check for File Path Changes Before Recovery

One of the most common issues after a restore is that metadata files contain absolute paths that no longer match your storage layout. For example:

  • You restored from hdfs:// to s3a://
  • You renamed the top-level directory or changed bucket prefixes
  • You used a new storage mount or alias

Before re-registering a table, inspect the metadata and manifest files to confirm whether the file paths match their new locations. If not, use rewrite_table_path to adjust them before registering.

4. Automate Validation Post-Restore

After restoring and registering a table, don’t assume everything works—validate it.

Things to check:

  • Can you run a simple SELECT COUNT(*)?
  • Does the snapshot history include what you expect?
  • Are partition pruning and data skipping functioning normally?
  • Do DESCRIBE HISTORY and DESCRIBE DETAIL return expected results?

You can automate these checks as part of your recovery workflow using Spark or Trino queries to catch misconfigurations early.

5. Dry-Run Your Recovery Plan

Don’t wait until a real disaster to test your recovery workflow.

  • Restore a test table from a known-good backup to a staging catalog
  • Practice using rewrite_table_path and register_table
  • Measure how long the recovery takes and what tools you need
  • Document the entire process in an internal runbook

Even a few dry runs will make a difference when you're under pressure in an actual outage.

Conclusion

In data lakehouses, disaster recovery is no longer just an afterthought. As Apache Iceberg powers more mission-critical workloads across cloud and hybrid environments, your ability to recover tables reliably becomes as important as your ability to query them quickly.

Unlike traditional databases, Iceberg doesn’t bundle storage, metadata, and catalog into a single system. Instead, it gives you flexibility—with the tradeoff that restoring from a backup requires understanding how those components fit together:

  • If your catalog is missing or misaligned, use register_table to re-establish the link to your latest metadata.json.
  • If your file paths have changed, use rewrite_table_path to rewrite metadata and create a clean recovery copy.
  • And if you want to avoid firefighting in the future, implement proactive backup practices, track your metadata versions, and validate your recovery process in advance.

The truth is, Iceberg gives you everything you need to be resilient—but it doesn't hold your hand. That’s a strength, not a weakness, for teams ready to take ownership of their data systems.

Recovery doesn’t have to be a high-stakes scramble. It can be a smooth, auditable, and even routine process with a thoughtful plan, the right tools, and a few good habits.

So the next time disaster strikes your data lake—whether it’s a misconfigured job, a storage failure, or a region outage—you won’t be scrambling. You’ll be executing.

Get Hands-on with Apache Iceberg on your Laptop

Sign up for AI Ready Data content

Explore the Key Benefits of Disaster Recovery for Apache Iceberg Tables for Building an Intelligent, Scalable Lakehouse

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.