Dremio is a lakehouse platform that enables companies to run enterprise analytics workloads directly on data lake storage. As part of their data lifecycle, companies can ensure optimal query performance with Dremio’s optimization capabilities for Apache Iceberg tables.
Today, we’re excited to announce Dremio’s support for automated table cleanup, which helps companies easily minimize storage utilization and adhere to data retention policies by removing snapshot, metadata, and data files that are no longer needed.
Why Table Cleanup?
Snapshots are a fundamental concept in Iceberg. Snapshots help query engines quickly understand which data files comprise a table at a point in time, and are also useful for time travel and rollback scenarios. However, each write to an Iceberg table creates a new snapshot, or version, of that table. These snapshots accumulate over time, and therefore need regular cleanup to minimize table metadata.
Companies can manually expire snapshots and delete unused data files for individual tables using Dremio’s VACUUM TABLE SQL command. However, this can be an arduous task for companies with hundreds or thousands of tables, who need to run the VACUUM TABLE SQL command manually, or write custom schedulers to run this programmatically.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
What's New?
Dremio Software: Companies can now use the VACUUM CATALOG SQL command to expire snapshots and orphaned metadata and data files for all Iceberg tables in a specified catalog. This eliminates the time and effort required to run individual VACUUM TABLE commands against individual tables. VACUUM CATALOG is supported for Nessie Catalogs as of Dremio Software version 24.3.
Dremio Cloud:Automatic table cleanup is now enabled by default for any Dremio Cloud organization using Dremio Arctic as their Iceberg catalog. Arctic automatically performs table cleanup once a day, and deletes orphaned Iceberg metadata files, as well as Iceberg snapshots (i.e., versions) that are older than the customer-defined retention period. When snapshots are deleted, Arctic deletes both the metadata and all Parquet data files that are not referenced by any other snapshot that has not been deleted.
Questions or Feedback?
If you have any questions or feedback, please post to the Dremio Community page or contact your Dremio account team and we’ll be happy to assist.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.