Open data catalog for the modern lakehouse

The open standard for Iceberg catalogs

Built on Apache Polaris, Open Catalog gives every engine an entry point to your Iceberg tables, with autonomous table maintenance, unified governance, and Iceberg Clustering built on top, without locking you in.

Any engine, full read-write access

Any Iceberg REST-compatible engine, tool, or agent (Spark, Flink, Trino, DuckDB, PyIceberg, and more) connects through the same interface. Open by standard, fully governed.

Governance across your entire lakehouse

Fine-grained access control is enforced consistently across every connected engine, source, and agent. Role-based permissions, row filters, column masking, and credential vending are configured once at the catalog layer and applied everywhere.

HOW IT WORKS

One catalog layer for every engine, every agent

Open Catalog sits at the center of your lakehouse. Every engine connects through the Iceberg REST API: Dremio, Spark, Flink, Trino, DuckDB, and more. Tables live in open format on your object storage. Governance and table maintenance are managed once at the catalog layer, and every dataset is discoverable by any connected engine or agent.

CAPABILITIES

What Open Catalog brings to your data environment

Metadata management and agent discoverability

Metadata covers Iceberg tables and any source Dremio can connect to, making your full data domain cataloged and agent-queryable
AI-generated wikis, labels, and semantic descriptions make every table discoverable by Dremio's AI Agent and any MCP-connected agent
Dataset lineage traces data flow across connected sources so agents understand what a table is, where it came from, and what changed

Multi-engine read and write via Iceberg REST

Full R/W access for any Iceberg REST-compatible engine: Spark, Flink, Trino, DuckDB, Pyiceberg, StarRocks and more
Engine-neutral: no proprietary connectors or SDKs required. Any engine implementing the Iceberg REST Catalog spec connects through the same standard interface with no lock-in
Connects to external catalogs too: AWS Glue, Databricks Unity, Snowflake Open Catalog, Nessie, S3 Tables, Confluent Tableflow

Unified access control

Least-privilege access at every query via credential vending, Privilege Grants, and Impersonated Access at the storage, table and source level
Policies travel with your data. Column masking and row filters apply whether access comes from Dremio's engine or any MCP-connected AI agent
SQL-native RBAC via GRANT/REVOKE governs catalogs, schemas, tables, and views across all connected engines

Autonomous table maintenance and Iceberg clustering

Auto-compaction and auto-cleanup run continuously: small files merged, orphaned files removed, no manual scheduling required
Native Iceberg Clustering uses Z-order via space-filling curves to optimize query performance directly in the open Iceberg format
Snapshot retention is managed automatically, keeping storage lean; Iceberg-native time travel via AT SNAPSHOT and AT TIMESTAMP remains available throughout the retention window, supporting Apache Polaris and other data catalogs, open source with no proprietary metastore lock-in

COMPARE

How Dremio stacks up

See how Dremio's Intelligent Query Engine compares across the capabilities that matter most.

Platform capability comparison
Capability	Dremio	Snowflake	Databricks
Query Engine Architecture	Arrow-native. LLVM codegen. Iceberg-native. No format conversion required.	Proprietary columnar format with query compilation layer.	Photon engine with Delta Lake optimization.
Autonomous Query Acceleration	Reflections auto-optimize queries. ML-driven acceleration without manual tuning.	Materialized views require manual definition.	Delta caching with cluster-level optimization.
Federated Query	Query across any source without data movement. True federation with the semantic layer.	Limited federation via external tables.	Unity Catalog with lakehouse federation.
Open Catalog	Native Iceberg, Hive, Delta and Hudi support. Open catalog with no vendor lock-in.	Proprietary catalog with limited export.	Unity Catalog with Delta Lake focus.
Native Iceberg Support	Built for Iceberg from the ground up. Full spec compliance with time travel and schema evolution.	Iceberg tables are supported via external tables.	UniForm for Iceberg interoperability.
Query Caching	Multi-layer intelligent caching. Result, metadata, and reflection caching with predictive prefetch.	Result set caching at the warehouse level.	Delta cache and Spark cache layers.

Dremio Cloud

Query Engine Architecture Arrow-native. LLVM codegen. Iceberg-native. No format conversion required.
Autonomous Query Acceleration Reflections auto-optimize queries. ML-driven acceleration without manual tuning.
Federated Query Query across any source without data movement. True federation with the semantic layer.
Open Catalog Native Iceberg, Hive, Delta and Hudi support. Open catalog with no vendor lock-in.
Native Iceberg Support Built for Iceberg from the ground up. Full spec compliance with time travel and schema evolution.
Query Caching Multi-layer intelligent caching. Result, metadata, and reflection caching with predictive prefetch.

FAQs

Frequently asked questions about Open Catalog

Common questions about Dremio's Apache Polaris-based catalog, multi-engine interoperability, governance and table management.

Open Catalog is Dremio’s robust catalog built on Apache Polaris, the open standard for Iceberg catalog interoperability. Dremio co-created Polaris and donated it to the Apache Software Foundation to keep it vendor-neutral.

Open Catalog delivers core Polaris metadata management capabilities, plus additional capabilities : autonomous table maintenance, Iceberg Clustering, fine-grained governance, and deep integration with Dremio’s Intelligent Query Engine and AI Agent.

RESOURCES

Discover More

View more articles

Product Insights from the Dremio Blog

How Apache Polaris Powers Dremio’s Open Catalog

Apache Polaris provides the open foundation, but enterprises running enterprise-wide, AI-driven platforms need more than a standards-compliant catalog service. They need enterprise governance, automation, and business context that…

Mark Shainman

The Brain of the Agentic Lakehouse: Inside Dremio’s Open Catalog Architecture

The Dremio Open Catalog is the vital bridge between open-source flexibility and enterprise-grade AI readiness. By moving from a "static mirror" of Iceberg files to a semantic, agent-ready architecture, Dremio allows organizations to finally realize the promise of a self-managing data lakehouse.

Alex Merced

Get a Supercharged Iceberg Catalog: Introducing Dremio and Apache Polaris

Dremio Catalog fundamentally changes the game for data lakehouse adoption. By providing a free, fully managed, and feature-rich service built on the open Apache Polaris project, Dremio removes the significant barriers to standing up a universal Iceberg catalog.

Alex Merced

Data catalog platform built on Apache Polaris. Open by design.

Query your Iceberg data from any engine, with unified governance features and no vendor lock-in.

Start For Free

Talk to an Expert