Dremio Catalog is a fully managed and free Iceberg catalog that simplifies metadata management for diverse query engines.
It enhances the Apache Polaris project with features like automatic table optimization to improve query performance.
Dremio Catalog incorporates enterprise-grade governance features, including role-based access control and fine-grained access control.
It creates a universal semantic layer, connecting various data sources for a cohesive business view.
Users can still self-host the Apache Polaris project while leveraging Dremio's features for seamless integration.
As organizations embrace the modern data lakehouse, managing metadata has become a critical challenge. Apache Iceberg is rapidly becoming the standard for organizing huge analytic datasets, but its rise has created a new problem: how do you manage Iceberg tables across a growing ecosystem of different query engines and tools? The answer is a universal, open catalog that can serve as a central source of truth for all your Iceberg metadata.
Apache Polaris is the open-source incubating Apache project designed to be that universal catalog. It provides a REST-based catalog service for Iceberg that any engine can use. While Polaris is a powerful project, setting up and managing it requires infrastructure and expertise.
This is where Dremio comes in. Dremio offers a fully managed version of Polaris called Dremio Catalog. It provides all the power of the open-source project, adds critical enterprise features, and makes adoption incredibly easy. Best of all, it’s completely free. This article covers the key reasons why Dremio Catalog is the best way to leverage Apache Polaris for your data lakehouse.
Takeaway 1: It's a Fully Managed, and Completely Free, Iceberg Catalog
Dremio Catalog is a fully managed service powered by the open-source Apache Polaris project. The most significant benefit is that the catalog is completely free. There are no costs associated with the catalog itself and no charges for the number of API requests you make to it. You get an enterprise-grade, multi-engine Iceberg catalog without any operational overhead or direct cost.
Dremio’s pricing model is straightforward: you pay only for optional Dremio services. These include Dremio's high-performance compute engines, Dremio-managed storage for your tables, or Dremio’s LLM-based AI features. However, none of these are required to use the free Dremio Catalog with your own compute engines and storage.
Takeaway 2: It Supercharges Polaris with Automatic Table Optimization
Dremio Catalog enhances the core functionality of Polaris with built-in, automated table maintenance. As data is written to and updated in Apache Iceberg tables, especially from streaming or frequent ingestion jobs, the accumulation of small data and metadata files can severely degrade query performance, a common challenge known as the 'small file problem'.
Dremio Catalog automatically runs background maintenance processes to optimize your tables for better query performance and reduced storage costs. This service handles several key tasks:
Compacting: Merges small data and metadata files into larger, more efficient ones, which speeds up queries by reducing the number of files the engine needs to read.
Partitioning: Physically organizes data based on column values, allowing queries to skip irrelevant data (a technique known as partition pruning).
Rewriting: Optimizes manifest files for better organization and faster metadata lookups.
Removing: Deletes position delete files to physically remove rows and reclaim storage.
Clustering: Groups related data together within files to improve the speed of queries that filter on specific columns.
Takeaway 3: It Adds Enterprise-Grade Governance Out of the Box
Dremio Catalog provides robust, built-in data governance features that are essential for any enterprise. Instead of managing access control separately for each tool, Dremio allows you to define policies once in the catalog and have them apply universally. Key features include:
Role-Based Access Control (RBAC): Dremio implements a comprehensive RBAC system. Roles define what actions users can perform (e.g., SELECT, ALTER), while permissions control access to specific resources like catalogs, tables, and views. This allows administrators to manage who can view, create, or modify data objects.
Fine-Grained Access Control (FGAC): Dremio supports advanced security through policies that control access at a more granular level. This includes row-level access (e.g., a sales manager can only see data for their region) and column-masking (e.g., hiding sensitive PII data from certain users).
It is important to note that Fine-Grained Access Control policies currently only apply to queries that are executed through a Dremio compute engine.
Takeaway 4: It Creates a Universal Semantic Layer Across All Your Data
Dremio Catalog serves as the foundation for a universal semantic layer, providing a unified and business-friendly view of all your data. Dremio can connect to a wide range of data sources, including object storage, data lakes, data warehouses, and databases, and lets you query data in place without making copies. On top of this connected data, you can build a powerful semantic layer.
Users can create virtual Views using standard SQL to transform, join, and aggregate data from these disparate sources. These views don't move or duplicate the underlying data. Instead, they provide a layer of business logic and consistent definitions. This transforms the catalog into a single source of truth for key metrics and business definitions, regardless of whether the data lives in an Iceberg table, a PostgreSQL database, or a Snowflake warehouse.
What if I Want to Self-Host? You Still Can.
For teams that prefer to manage their own infrastructure, it is absolutely possible to self-deploy the open-source Apache Polaris project.
Even with a self-hosted Polaris instance, you can still connect to it as a source within Dremio. This interoperability is a core benefit of open standards. The connection is made possible by Dremio's native support for any catalog that implements the Iceberg REST Catalog specification, which Polaris is built on. This gives you the flexibility to choose the deployment model that best fits your organization's needs.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
7. Conclusion: The Future of the Open Lakehouse is Here
Dremio Catalog fundamentally changes the game for data lakehouse adoption. By providing a free, fully managed, and feature-rich service built on the open Apache Polaris project, Dremio removes the significant barriers to standing up a universal Iceberg catalog. It delivers a secure, optimized, and unified platform that accelerates your journey to an open and flexible data architecture.
With a free, enterprise-grade Iceberg catalog now available to everyone, how will it change the way your team builds its next-generation data platform?
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.