Dremio Blog

10 minute read · January 12, 2026

Get a Supercharged Iceberg Catalog: Introducing Dremio and Apache Polaris

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
Get a Supercharged Iceberg Catalog: Introducing Dremio and Apache Polaris
Copied to clipboard

Key Takeaways

  • Dremio Catalog is a fully managed and free Iceberg catalog that simplifies metadata management for diverse query engines.
  • It enhances the Apache Polaris project with features like automatic table optimization to improve query performance.
  • Dremio Catalog incorporates enterprise-grade governance features, including role-based access control and fine-grained access control.
  • It creates a universal semantic layer, connecting various data sources for a cohesive business view.
  • Users can still self-host the Apache Polaris project while leveraging Dremio's features for seamless integration.

As organizations embrace the modern data lakehouse, managing metadata has become a critical challenge. Apache Iceberg is rapidly becoming the standard for organizing huge analytic datasets, but its rise has created a new problem: how do you manage Iceberg tables across a growing ecosystem of different query engines and tools? The answer is a universal, open catalog that can serve as a central source of truth for all your Iceberg metadata.

Apache Polaris is the open-source incubating Apache project designed to be that universal catalog. It provides a REST-based catalog service for Iceberg that any engine can use. While Polaris is a powerful project, setting up and managing it requires infrastructure and expertise.

This is where Dremio comes in. Dremio offers a fully managed version of Polaris called Dremio Catalog. It provides all the power of the open-source project, adds critical enterprise features, and makes adoption incredibly easy. Best of all, it’s completely free. This article covers the key reasons why Dremio Catalog is the best way to leverage Apache Polaris for your data lakehouse.

Takeaway 1: It's a Fully Managed, and Completely Free, Iceberg Catalog

Dremio Catalog is a fully managed service powered by the open-source Apache Polaris project. The most significant benefit is that the catalog is completely free. There are no costs associated with the catalog itself and no charges for the number of API requests you make to it. You get an enterprise-grade, multi-engine Iceberg catalog without any operational overhead or direct cost.

Dremio’s pricing model is straightforward: you pay only for optional Dremio services. These include Dremio's high-performance compute engines, Dremio-managed storage for your tables, or Dremio’s LLM-based AI features. However, none of these are required to use the free Dremio Catalog with your own compute engines and storage.

Takeaway 2: It Supercharges Polaris with Automatic Table Optimization

Dremio Catalog enhances the core functionality of Polaris with built-in, automated table maintenance. As data is written to and updated in Apache Iceberg tables, especially from streaming or frequent ingestion jobs, the accumulation of small data and metadata files can severely degrade query performance, a common challenge known as the 'small file problem'.

Dremio Catalog automatically runs background maintenance processes to optimize your tables for better query performance and reduced storage costs. This service handles several key tasks:

  • Compacting: Merges small data and metadata files into larger, more efficient ones, which speeds up queries by reducing the number of files the engine needs to read.
  • Partitioning: Physically organizes data based on column values, allowing queries to skip irrelevant data (a technique known as partition pruning).
  • Rewriting: Optimizes manifest files for better organization and faster metadata lookups.
  • Removing: Deletes position delete files to physically remove rows and reclaim storage.
  • Clustering: Groups related data together within files to improve the speed of queries that filter on specific columns.

Takeaway 3: It Adds Enterprise-Grade Governance Out of the Box

Dremio Catalog provides robust, built-in data governance features that are essential for any enterprise. Instead of managing access control separately for each tool, Dremio allows you to define policies once in the catalog and have them apply universally. Key features include:

  • Role-Based Access Control (RBAC): Dremio implements a comprehensive RBAC system. Roles define what actions users can perform (e.g., SELECT, ALTER), while permissions control access to specific resources like catalogs, tables, and views. This allows administrators to manage who can view, create, or modify data objects.
  • Fine-Grained Access Control (FGAC): Dremio supports advanced security through policies that control access at a more granular level. This includes row-level access (e.g., a sales manager can only see data for their region) and column-masking (e.g., hiding sensitive PII data from certain users).

It is important to note that Fine-Grained Access Control policies currently only apply to queries that are executed through a Dremio compute engine.

Takeaway 4: It Creates a Universal Semantic Layer Across All Your Data

Dremio Catalog serves as the foundation for a universal semantic layer, providing a unified and business-friendly view of all your data. Dremio can connect to a wide range of data sources, including object storage, data lakes, data warehouses, and databases, and lets you query data in place without making copies. On top of this connected data, you can build a powerful semantic layer.

Users can create virtual Views using standard SQL to transform, join, and aggregate data from these disparate sources. These views don't move or duplicate the underlying data. Instead, they provide a layer of business logic and consistent definitions. This transforms the catalog into a single source of truth for key metrics and business definitions, regardless of whether the data lives in an Iceberg table, a PostgreSQL database, or a Snowflake warehouse.

What if I Want to Self-Host? You Still Can.

For teams that prefer to manage their own infrastructure, it is absolutely possible to self-deploy the open-source Apache Polaris project.

Even with a self-hosted Polaris instance, you can still connect to it as a source within Dremio. This interoperability is a core benefit of open standards. The connection is made possible by Dremio's native support for any catalog that implements the Iceberg REST Catalog specification, which Polaris is built on. This gives you the flexibility to choose the deployment model that best fits your organization's needs.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

7. Conclusion: The Future of the Open Lakehouse is Here

Dremio Catalog fundamentally changes the game for data lakehouse adoption. By providing a free, fully managed, and feature-rich service built on the open Apache Polaris project, Dremio removes the significant barriers to standing up a universal Iceberg catalog. It delivers a secure, optimized, and unified platform that accelerates your journey to an open and flexible data architecture.

With a free, enterprise-grade Iceberg catalog now available to everyone, how will it change the way your team builds its next-generation data platform?

Start your Dremio Free Trial Today!

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.