30 minute read · August 8, 2025

Optimizing Apache Iceberg Tables – Manual and Automatic

Alex Merced

Alex Merced · Head of DevRel, Dremio

Even the most well-designed Apache Iceberg table won’t stay fast forever. Over time, the simple act of ingesting, updating, and deleting data can quietly erode performance. Small files pile up. Metadata grows unwieldy. Partitions drift out of alignment with query patterns. Old snapshots linger, bloating storage and slowing planning.

Left unchecked, these issues can turn sub-second queries into sluggish waits—wasting compute, inflating cloud bills, and frustrating both human analysts and AI-driven applications.

That’s where Dremio’s OPTIMIZE and VACUUM commands come in. Together, they give you the power to compact data, streamline metadata, and clear out stale history, restoring your Iceberg tables to peak efficiency. Whether you’re fine-tuning performance for a critical dashboard, cleaning up after a large data load, or running regular maintenance, these commands are your go-to toolkit for keeping Iceberg lean and lightning-fast.

And if you’d rather not think about optimization at all? With Dremio Enterprise Catalog, your tables can optimize and cluster themselves automatically, so you get all the speed without the operational overhead.

The Problem: How Iceberg Tables Get Slower Over Time

When you first create an Apache Iceberg table, it often feels like everything just works. Queries are fast, metadata is small, and files are neatly organized. But as the table evolves—through streaming ingests, micro-batches, updates, and deletes—the underlying file and metadata structure starts to drift from its optimal state. This drift doesn’t happen overnight, and you might not notice it until performance issues appear. By then, the root causes are buried deep in the physical layout of the data.

One of the most common culprits is the accumulation of small files. Frequent inserts from streaming jobs or low-volume micro-batches create hundreds or thousands of tiny data files instead of a smaller number of well-sized ones. Query engines like Dremio, Spark, or Trino have to open each file individually, which adds network and metadata overhead for every read. Even if each file is only a few megabytes, the cumulative planning and I/O cost can be huge. Worse, Iceberg must track every file in its metadata, so small-file sprawl inflates manifest files and slows query planning before a single row is read.

Poor data colocation is another silent performance killer. Iceberg partitions or clusters data to improve pruning and reduce unnecessary reads. But if your partitioning strategy doesn’t match your query patterns, the engine will be forced to scan many irrelevant files. For example, a sales table partitioned by region might perform well for region-based analytics, but if most queries filter by product category, they’ll still hit multiple partitions. Over time, evolving query patterns, schema changes, or new data sources can make the original partition strategy less effective, leaving data scattered and inefficient to scan.

Metadata sprawl can also degrade performance without touching the data files themselves. Iceberg uses manifest files to track data file locations, statistics, and partition information. Each snapshot can reference many manifests, and high-frequency writes or incremental loads often create a lot of small or redundant manifests. This adds more layers for the query engine to read and parse before it even decides which data files to open. Even if your data files are perfectly sized, too many manifests can turn query planning into a bottleneck.

Merge-on-read delete file buildup is a more specialized but equally important issue. In Iceberg’s merge-on-read (MOR) mode, updates and deletes are stored as separate position delete files rather than rewriting the original data files. This approach is efficient for writes, but it shifts the cost to reads. Every time a query runs, the engine has to load the base data files and then apply all associated delete files in memory to reconstruct the current state. As delete files accumulate, read times increase, and the overhead can become severe—especially if a single data file has dozens or even hundreds of delete files linked to it. Without intervention, this pattern can quietly drag down performance in high-churn datasets.

Old snapshots present a different kind of problem. One of Iceberg’s strengths is its support for time travel and snapshot isolation, allowing you to query data as it existed at any previous point in time. But retaining too many snapshots comes at a cost. Each snapshot keeps references to its own data and metadata files, even if they are no longer relevant to current workloads. Over time, the table grows heavier with files that serve no operational purpose, inflating storage bills and forcing query planners to sift through extra metadata. The larger the historical footprint, the more work every maintenance operation—like schema evolution or partition changes—has to perform.

These problems don’t happen because Iceberg is flawed; they’re a natural byproduct of its flexibility and transactional design. The same features that make it a robust, open table format also mean that, without maintenance, its performance will slowly degrade. The good news is that Iceberg provides built-in ways to address these issues, and Dremio makes them easier to execute with powerful commands for compaction, metadata cleanup, and snapshot expiration. That’s where OPTIMIZE and VACUUM come in.

I can now move into the solution section where we explain these two commands in depth and how to tune them for different optimization strategies.

The Solution: Reclaiming Performance with OPTIMIZE

The first and most versatile tool for restoring Iceberg performance in Dremio is the OPTIMIZE command. At its core, OPTIMIZE rewrites data and metadata files into a more efficient layout—merging small files, splitting oversized files, applying clustering for sorted tables, and cleaning up fragmented manifest files. By bringing files back into the optimal size range and aligning them with the current partitioning scheme, OPTIMIZE reduces query planning time, lowers storage overhead, and improves scan efficiency.

In practice, OPTIMIZE is your primary compaction mechanism. It addresses small file problems by grouping them into larger files, eliminates oversized files that limit parallelism, and can even remove position delete files in merge-on-read tables by rewriting affected data files. When paired with optional manifest rewrites, it also keeps Iceberg’s metadata layer lean, which can have just as much impact on query speed as the data layout itself.

The basic syntax is straightforward:

OPTIMIZE TABLE <table_name>
   [ REWRITE DATA USING BIN_PACK
      [ ( { TARGET_FILE_SIZE_MB | MIN_FILE_SIZE_MB | MAX_FILE_SIZE_MB | MIN_INPUT_FILES } = <value> [, ... ] ) ]
   ]
   [ FOR PARTITIONS <predicate> ]
   [ REWRITE MANIFESTS ]

You can run OPTIMIZE on an entire table or target specific partitions. The REWRITE DATA USING BIN_PACK clause is the default and most common strategy, aiming for balanced file sizes based on the TARGET_FILE_SIZE_MB setting. Other parameters let you fine-tune which files are included, how aggressively compaction runs, and whether metadata manifests are consolidated as part of the job.

Next, we’ll break down each of these parameters in detail—starting with file size targets—so you can tailor OPTIMIZE to your data’s size, shape, and workload patterns.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

TARGET_FILE_SIZE_MB – Setting the Goal for File Size

The TARGET_FILE_SIZE_MB parameter controls the ideal output size for files produced during an OPTIMIZE job. By default, Dremio sets this to 512 MB, which strikes a balance between reducing the number of files the engine must open and maintaining enough parallelism for distributed query execution.

Choosing the right value isn’t about finding one universal number—it’s about matching file size to your query patterns, data characteristics, and cluster resources. Larger target file sizes reduce file open and metadata overhead, making them ideal for workloads that scan large portions of a dataset. However, very large files can limit parallelism and increase memory pressure, especially if queries often filter on narrow ranges of data.

Smaller target file sizes can help in scenarios where:

  • Your queries are highly selective and benefit from parallel reads across many smaller files.
  • Your cluster has limited executor memory, making large files risky for processing.
  • Your data contains wide rows or complex nested structures that inflate in-memory size during scans.

On the other hand, sticking with or increasing beyond the default can be beneficial when:

  • You run frequent full-table scans, such as for analytics dashboards or training machine learning models.
  • You have many small files from streaming ingestion and want to minimize the total number of files for planning efficiency.
  • Your storage system has high latency per file open, making fewer, larger files more efficient.

Optimizing for different goals often means adjusting this value in response to observed performance:

  • Maximize throughput: Increase TARGET_FILE_SIZE_MB toward 1 GB or more, reducing file count and metadata size.
  • Maximize parallelism: Decrease toward 256 MB or lower, ensuring more splits for the query planner to distribute.
  • Balanced optimization: Start with the default 512 MB, monitor query times and resource usage, and adjust incrementally.

Ultimately, TARGET_FILE_SIZE_MB is your compass for compaction—it points the OPTIMIZE process toward the file layout that best fits your workloads. In the next section, we’ll look at MIN_FILE_SIZE_MB and MAX_FILE_SIZE_MB, which define the range of file sizes eligible for rewriting.

MIN_FILE_SIZE_MB and MAX_FILE_SIZE_MB – Defining the Rewrite Boundaries

While TARGET_FILE_SIZE_MB sets the ideal size for output files, MIN_FILE_SIZE_MB and MAX_FILE_SIZE_MB determine which existing files qualify for rewriting during an OPTIMIZE job. Together, they act as the guardrails that keep compaction focused on files that will actually benefit from resizing.

MIN_FILE_SIZE_MB defines the smallest file size that OPTIMIZE will consider for compaction. Files smaller than this threshold are merged into larger files to reduce metadata bloat and file open costs. The default is roughly 24% of the target size—so with the default target of 512 MB, anything under about 123 MB is eligible. Increasing this value will make OPTIMIZE more aggressive in merging smaller files, which is useful in streaming-heavy environments where small files accumulate quickly. Lowering it can help avoid unnecessary rewrites when smaller files are acceptable or when compute resources are limited.

MAX_FILE_SIZE_MB sets the largest file size that OPTIMIZE will consider splitting. Files larger than this threshold are broken down into smaller pieces, which improves parallelism and reduces the memory needed to process a single file. By default, this is 180% of the target—about 922 MB if the target is 512 MB. Increasing this threshold allows larger files to persist, which can be fine for sequential scan workloads. Decreasing it forces OPTIMIZE to break down oversized files sooner, which can benefit selective queries and clusters with many executors.

Tuning these values is about focusing your resources:

  • For high-ingest, streaming workloads: Raise MIN_FILE_SIZE_MB so more small files get merged, but keep MAX_FILE_SIZE_MB at default or higher to avoid splitting recently ingested large files unnecessarily.
  • For mixed workloads: Keep both thresholds near defaults for a balanced approach, targeting the most impactful file size extremes without rewriting too much data.
  • For highly selective queries: Lower MAX_FILE_SIZE_MB to encourage smaller, more parallel-friendly files, while keeping MIN_FILE_SIZE_MB low enough to skip rewrites of moderately sized files.

These boundaries help ensure that OPTIMIZE focuses its effort on the files that will yield the biggest performance gains, without wasting compute on rewriting files that are already efficient. Next, we’ll explore MIN_INPUT_FILES, which controls when OPTIMIZE decides there’s enough work to justify running at all.

MIN_INPUT_FILES – Setting the Threshold for When to Compact

MIN_INPUT_FILES controls the minimum number of eligible files that must be present before OPTIMIZE will run a compaction job. This parameter acts as a safeguard against spending compute resources on jobs that won’t deliver meaningful performance improvements.

By default, Dremio sets this to 5, meaning that OPTIMIZE won’t trigger unless at least five files meet the rewrite criteria defined by MIN_FILE_SIZE_MB and MAX_FILE_SIZE_MB. This prevents unnecessary jobs when there are only a few slightly undersized or oversized files—situations where the performance gain might be negligible compared to the cost of the operation.

Tuning MIN_INPUT_FILES lets you control how opportunistic or conservative OPTIMIZE should be:

  • Lower values (e.g., 2–3) make compaction more eager, running even when only a few files are out of range. This can be useful for high-priority tables that must remain in peak condition, such as those powering real-time dashboards or AI applications.
  • Higher values (e.g., 10–20) make compaction more conservative, triggering only when a substantial batch of files can be rewritten together. This is ideal for large datasets where each OPTIMIZE job has significant resource and runtime costs.

The optimal setting depends on your balance between performance freshness and operational efficiency:

  • In streaming ingestion pipelines, where small files appear constantly, lowering MIN_INPUT_FILES ensures frequent incremental cleanup and prevents backlog.
  • In batch-oriented workloads, raising the threshold reduces the number of compaction jobs, consolidating work into larger, more efficient runs.

Used in conjunction with file size parameters, MIN_INPUT_FILES helps fine-tune OPTIMIZE’s responsiveness—ensuring you’re not overcompacting, but also not letting performance degrade for too long.

Next, we’ll look at FOR PARTITIONS, which allows you to target optimization to only the most critical or high-traffic sections of a table.

FOR PARTITIONS – Targeting Optimization Where It Matters Most

The FOR PARTITIONS clause lets you focus OPTIMIZE on specific partitions of a table rather than processing the entire dataset. This is especially valuable for large, partitioned Iceberg tables where only certain partitions are actively changing or queried frequently. Instead of rewriting thousands of files across cold, historical partitions, you can direct compaction efforts toward the “hot” areas where it will have the biggest impact.

The syntax is simple and uses SQL-style predicates:

OPTIMIZE TABLE sales
  REWRITE DATA USING BIN_PACK
  FOR PARTITIONS region = 'us-west'

You can filter on one or more partition columns, and even combine conditions with logical operators. For example:

FOR PARTITIONS region = 'us-west' AND sale_date >= '2024-01-01'

This ability to scope optimization offers several advantages:

  • Faster runtime – Smaller jobs complete quickly, which means you can run them more frequently without disrupting workloads.
  • Lower resource usage – Only the relevant partitions are read, rewritten, and committed, reducing compute and I/O costs.
  • Minimal interference – Targeted jobs are less likely to compete with other workloads for cluster resources, making them safer to run during business hours.

When to use FOR PARTITIONS:

  • High-churn partitions – If certain partitions see constant updates or streaming inserts, optimizing them more frequently prevents small-file buildup and delete file accumulation.
  • Critical query paths – If business-critical queries focus on specific time windows or regions, optimizing those partitions ensures consistently fast response times.
  • Staged maintenance – For very large tables, you can rotate through partitions over time, optimizing different slices on a schedule rather than attempting a massive full-table rewrite.

By combining FOR PARTITIONS with file size thresholds and MIN_INPUT_FILES, you can build an incremental optimization strategy that keeps your most important data in peak condition without overwhelming your system.

Next, we’ll cover REWRITE MANIFESTS, which focuses on cleaning up the metadata layer that Iceberg uses to track data files.

REWRITE MANIFESTS – Cleaning Up the Metadata Layer

While most OPTIMIZE operations focus on data files, REWRITE MANIFESTS targets a different but equally important part of an Iceberg table: its metadata. Manifest files act like an index, listing the data files in a table along with their statistics, partition information, and other details. Over time, frequent writes—especially small streaming batches—can create a large number of small or redundant manifests. This condition, known as metadata sprawl, increases query planning time because the engine must read and process every relevant manifest before it can decide which data files to scan.

REWRITE MANIFESTS consolidates small manifests and splits oversized ones so that each is close to the optimal size. By default, Dremio aims for around 8 MB per manifest file, with anything between 0.75x and 1.8x of that size considered acceptable. Manifests smaller than this range are combined, and those larger than it are divided into smaller pieces. The result is a leaner metadata structure that reduces planning overhead for every query.

Running REWRITE MANIFESTS is typically much faster and less resource-intensive than rewriting data files because it only touches metadata. This makes it a good choice for quick tune-ups between full compaction runs. You can also pair it with a data rewrite in a single OPTIMIZE job:

OPTIMIZE TABLE sales
  REWRITE DATA USING BIN_PACK (TARGET_FILE_SIZE_MB = 512)
  REWRITE MANIFESTS

When to prioritize REWRITE MANIFESTS:

  • High-frequency write environments – Streaming or near-real-time ingestion often creates many small manifests that benefit from consolidation.
  • Query planning slowdowns – If queries spend significant time in the “planning” phase, cleaning up manifests can yield immediate improvements.
  • Metadata-heavy tables – Very large tables with deep snapshot histories can accumulate redundant manifests, making rewrites an efficient maintenance step.

By keeping the metadata layer tidy, REWRITE MANIFESTS ensures that even before a single row is read, your queries start as quickly as possible.

VACUUM TABLE – Expiring Old Snapshots and Reclaiming Storage

While OPTIMIZE focuses on improving query performance by restructuring data and metadata files, VACUUM TABLE tackles a different challenge: controlling table growth by removing old snapshots and the files they reference.

Apache Iceberg’s snapshot feature is one of its biggest strengths, enabling time travel and rollback to previous table states. Each snapshot records the data and metadata files that existed at a specific point in time, preserving a complete history of changes. However, retaining too many snapshots can bloat storage, slow down metadata operations, and increase planning costs for queries that need to navigate through a deep historical log.

VACUUM TABLE removes snapshots that are no longer needed and deletes the files (data files, manifest files, manifest lists, and partition stats) that are uniquely associated with them. This operation keeps only the history you choose to retain while freeing up space and reducing metadata size.

The syntax is straightforward:

VACUUM TABLE <table_name> 
    EXPIRE SNAPSHOTS [older_than '<timestamp>'] [retain_last <value>]
  • older_than sets a cutoff timestamp. Any snapshot created before this date will be removed unless protected by retain_last.
  • retain_last specifies the minimum number of most recent snapshots to keep, regardless of age.

If no parameters are provided, Dremio defaults to keeping only the most recent snapshot and removing those older than five days.

Examples:

  • Remove snapshots older than April 20, 2023, but keep at least 20 recent snapshots:

    VACUUM TABLE s3.sales EXPIRE SNAPSHOTS older_than '2023-04-20 00:00:00.000' retain_last 20
  • Keep the most recent 100 snapshots, regardless of age:

    VACUUM TABLE s3.sales EXPIRE SNAPSHOTS retain_last 100

When to run VACUUM:

  • After heavy ingestion or schema evolution – Large operational changes can leave behind obsolete files that are no longer referenced by active snapshots.
  • On a set schedule – Periodic cleanup prevents storage creep and maintains metadata agility.
  • To meet compliance or retention policies – Expiring old snapshots ensures sensitive data is fully removed from both active storage and historical states.

VACUUM is safe—Iceberg ensures that files referenced by active snapshots are never deleted—but it does mark the point at which older historical states are permanently gone. Choosing your retention window is a balance between auditability and operational efficiency.

Next, we’ll cover best practices for combining OPTIMIZE and VACUUM in a maintenance strategy that keeps Iceberg tables both fast and lean.

Best Practices for Combining OPTIMIZE and VACUUM

Keeping an Iceberg table healthy isn’t about running OPTIMIZE or VACUUM once in a while—it’s about establishing a maintenance rhythm that matches your workload. The right approach depends heavily on how data is ingested, how often it’s queried, and what your operational constraints look like.

For streaming or micro-batch ingestion, the primary challenge is constant small-file creation. If these aren’t addressed quickly, query performance will start to suffer, but running a full-table OPTIMIZE job too often can disrupt ingestion. The key is to adopt an incremental optimization strategy:

  • Use FOR PARTITIONS to target only the most active partitions (for example, the current day or week).
  • Keep MIN_FILE_SIZE_MB and MIN_INPUT_FILES tuned for quick, small jobs—these can run more frequently without consuming excessive cluster resources.
  • Save full-table OPTIMIZE runs for low-traffic periods, such as overnight or during weekend maintenance windows.

For batch ingestion or periodically updated datasets, you can take a more conservative approach:

  • Run full OPTIMIZE less frequently, but allow jobs to process a larger scope each time.
  • Combine REWRITE MANIFESTS with full compaction to ensure metadata stays as clean as the data files.
  • Align these jobs with other ETL processes to minimize the number of concurrent heavy operations.

VACUUM TABLE should also be part of the schedule, but it generally doesn’t need to run as often as OPTIMIZE. Snapshot expiration is less about day-to-day query performance and more about long-term storage efficiency and metadata manageability. A common pattern is to:

  • Run OPTIMIZE daily or weekly (depending on ingestion rate).
  • Run VACUUM weekly or monthly, expiring snapshots beyond your required retention window.

When managing both commands, watch for two pitfalls:

  1. Jobs that run too long – If OPTIMIZE starts overlapping with ingestion windows, queries may slow or even fail due to resource contention. Keep runtime short for frequent jobs by scoping to partitions or tightening rewrite thresholds.
  2. Under-maintenance – Being too conservative can allow performance debt to accumulate, making the eventual cleanup job larger and more disruptive. Monitor file counts and average file size to avoid falling behind.

A well-tuned combination of incremental and full maintenance jobs ensures that your tables stay fast without starving ingestion or overwhelming your cluster. And if you’d prefer to avoid scheduling and tuning entirely, there’s a way to let Dremio handle it automatically.

Next, we’ll wrap up by looking at how Dremio Enterprise Catalog can perform these optimizations for you—continuously and without manual intervention.

The Easy Button: Automatic Optimization with Dremio Enterprise Catalog

Manual tuning works, but it comes with trade-offs. You have to monitor file sizes, watch for metadata growth, schedule jobs around ingestion windows, and constantly balance the cost of optimization with its performance benefits. For teams managing dozens or hundreds of Iceberg tables, that operational overhead can quickly become its own full-time job.

Dremio Enterprise Catalog, powered by Apache Polaris, removes that burden by handling optimization automatically. It continuously analyzes your tables—watching for small-file buildup, oversized files, outdated partitions, and metadata sprawl—and then runs the necessary optimizations in the background. There’s no need to manually choose parameters or schedule jobs; the system determines when and how to compact, cluster, and clean up snapshots for you.

Automatic optimization isn’t just about running OPTIMIZE on a schedule. It’s intelligent:

  • Partition-aware – Targets hot or frequently queried partitions more often, while leaving cold partitions untouched until needed.
  • Workload-sensitive – Times optimization to avoid competing with ingestion and high query loads, minimizing disruption to production workloads.
  • Comprehensive – Handles both data file compaction and manifest rewrites, ensuring performance gains at every stage of query planning.
  • Snapshot management – Expiration policies can be applied automatically, keeping storage lean without sacrificing compliance requirements.

With autonomous optimization, you get the benefits of a constantly tuned Iceberg lakehouse without the complexity of designing and maintaining a manual job strategy. Your data teams can spend less time worrying about file sizes and more time delivering insights, while your queries—both human- and AI-driven—continue to run at top speed.

When combined with Dremio’s query acceleration, unified semantic layer, and zero-ETL data federation, Enterprise Catalog creates a truly self-managing data platform—one where optimization is just something that happens, not something you have to think about.

Learn more about Dremio at an Event!

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.