13 minute read · June 25, 2025
Why Dremio co-created Apache Polaris, and where it’s headed

Download a free copy of Apache Polaris: The Definitive Guide
As the data lake evolves into the lakehouse, one component has become more critical than ever: the metadata catalog. It’s the centerpiece of the system—tracking where data lives, who can access it, and enabling countless engines and tools to access that data. And while early catalogs, such as Hive Metastore (HMS) and AWS Glue, got us started, they were designed for a different era, when data lakes were homogeneous, and weren’t expected to deliver enterprise security or serve AI agents
In today’s world, that’s not enough.
Modern data environments are multi-engine, multi-cloud, and increasingly include both structured and unstructured data. Organizations need a catalog that’s open, flexible, and built to handle the complexities of the lakehouse era.
That’s why Dremio has been at the heart of the creation and development of Apache Polaris.
Together with Snowflake, Dremio launched Apache Polaris under the Apache Software Foundation to set a new standard for metadata catalogs, one that, like Apache Iceberg before it, is designed from day one to be vendor-neutral, cloud-agnostic, and built for the entire ecosystem.
Polaris isn’t just a metadata catalog; it’s the next big leap in how data is discovered, secured, and shared across the modern enterprise.
Why Dremio Co-Created Apache Polaris
The first step in building the open lakehouse was introducing a table format. We were the first vendor to back Apache Iceberg, a project born at Netflix and designed for the scale and flexibility the modern data world demands, and played a key role in building the technology and evangelizing it in the market. Our work alongside the rest of the nascent Iceberg community ultimately resulted in Iceberg becoming the standard table format for the lakehouse, with companies like Snowflake, AWS, Google, Microsoft, Confluent, and, even Databricks, adopting it
Now, the focus shifts to the next piece of the puzzle: the metadata catalog.
Just like we needed a table format that was open source, vendor-neutral and compatible with the broadest ecosystem, we now need a metadata catalog that does the same. Legacy solutions, such as Hive Metastore and AWS Glue, were a start. They helped teams manage tables across multiple engines. But they weren’t built with today’s scale, complexity or governance needs in mind.
Meanwhile, closed solutions like Databricks Unity Catalog are built to serve a single vendor’s ecosystem (with many of its most appealing features not being available in its open source version). That’s not how open data infrastructure should work. And it’s not how the winning technologies evolve. In data infrastructure, the standards are managed as Apache projects. That includes Apache Arrow (co-created by Dremio, with over 100M monthly downloads), Apache Parquet, Apache Iceberg, and even Apache Spark.
So, Dremio partnered with Snowflake to build the new standard: Apache Polaris.
Polaris is a next-generation metadata catalog, born from real-world needs, designed for interoperability, and open-sourced from day one. It’s built for the lakehouse era, and it’s rapidly gaining momentum as the new standard for how data is managed in open, multi-engine environments.
The Architecture of Apache Polaris
At its core, Apache Polaris (Incubating) is built to bring structure, governance, and security to the modern Iceberg-powered lakehouse. It provides a centralized, REST-based metadata catalog for Apache Iceberg™ tables—allowing dozens of engines and tools to discover, read, and write data consistently, regardless of where that data physically resides.
Polaris speaks Iceberg’s native language through its support of the Apache Iceberg REST protocol (IRC), enabling seamless interoperability with compatible engines like Spark, Flink, Athena, BigQuery, Dremio, DuckDB, Presto, Trino, Snowflake and much more.
Let’s break down the key building blocks:
Catalogs and Namespaces
A Polaris catalog is the top-level container that organizes Iceberg tables. It defines where and how your metadata and data files are stored—whether on S3, Azure, or GCS. You can create:
- Internal catalogs, fully managed by Polaris and writable from any engine.
- External catalogs, synced from other IRC compliant catalogs, and read-only within Polaris.
Within each catalog, you define namespaces–logical groupings that act like folders for your tables. Namespaces can be nested to reflect your org structure, data domains, or environments.
Service Principals and Service Connections
To connect query engines securely to Polaris, you use service principals—entities with credentials (Client ID and Secret) that authenticate access. Each connection from Dremio, Spark, Flink, Trino, or Snowflake runs through a service connection, which links to a service principal and defines what actions it can perform.
Polaris uses role-based access control (RBAC) to assign and enforce user privileges. With fine-grained roles, you can define exactly who can read, write, or manage what—across catalogs, namespaces, and tables.
Storage Configuration and IAM Integration
Polaris integrates directly with your cloud storage, no matter the provider. During catalog creation, Polaris generates secure IAM entities and trust relationships, ensuring it can access only what it needs, and nothing more.
And with credential vending, query engines get temporary credentials during query execution, eliminating the need to manage cloud access keys manually, while maintaining airtight security at the storage level.
Security That Scales
Polaris doesn't just provide access control, it enforces it consistently across all connected engines. Whether you’re running Spark ETL, Flink streams, or Dremio queries, Polaris governs all interactions with a unified security model.
That means better compliance, fewer gaps, and more confidence as your data footprint grows.
This architecture is what enables Polaris to serve as the central control plane of the open lakehouse; managing, securing, and exposing Iceberg tables to every engine that needs them, without locking you into any single platform.
The Road Ahead: A Platform for Unified Data Governance
Polaris isn’t just solving today’s problems, it’s laying the foundation for what’s next in data architecture. As more organizations adopt Iceberg, the need for a unified, open, and intelligent catalog is only growing. That’s exactly where Polaris is headed.
Here’s what’s on the horizon:
Fine-Grained Access Control Across All Engines
One of the most ambitious goals for Polaris is to enable fine-grained, cross-engine access control. Imagine defining policies once, at the catalog level, and having them enforced consistently across Dremio, Spark, Flink, Snowflake, Trino, Presto and beyond. Read-only access or Masking to a specific column? Write access to a subset of rows? Polaris will make that possible, centrally and securely.
This level of control isn’t just a convenience—it’s foundational for meeting enterprise security, compliance, and data governance requirements at scale.
Managing All Data—Structured and Unstructured
As AI and analytics converge, the lakehouse must evolve to support all types of data, not just structured tables (Iceberg and “Generic Tables”), but unstructured content like documents, images, and audio files (through a “Volumes” feature). Polaris is expanding to support this broader spectrum of data, bringing visibility, governance, and access control to everything in your lakehouse.
This means your AI teams can train models on governed datasets, your analysts can explore semi-structured content, and your data leaders can confidently manage it all from one place.
NoSQL and Cloud-Native Persistence
To serve diverse use cases and deployment scenarios, Polaris is evolving its backend options, extending it with NoSQL persistence. This will offer more scalability and flexibility for large-scale or real-time workloads, while maintaining Polaris’s clean separation of compute and metadata storage.
A First-Class Developer Experience
Polaris is becoming more than just a metadata catalog, it’s a platform. A sleek browser-based UI, robust APIs, and integration-friendly tooling are all on the roadmap. This ensures developers, data engineers, and analysts can easily manage, secure, and interact with data through the interface that suits them best.
Together, these innovations position Polaris as the control plane for the open lakehouse, powering secure, multi-modal data access across every engine and every cloud. This is where the lakehouse is going, and Polaris is how we get there.
Polaris is Becoming the Standard
When we at Dremio helped bring Apache Iceberg into the mainstream, we knew it had the potential to redefine how open data lakes are structured. Today, that’s no longer just a vision, Iceberg has become the industry standard, with backing from nearly every major cloud and data platform.
Dremio co-created Apache Polaris with that objective in mind, and it’s now clearly on that trajectory.
Already, Polaris is being adopted by leading companies across the ecosystem. It’s not a closed solution locked behind one vendor’s walls. It’s an incubating Apache Software Foundation project, built in the open, shaped by real-world needs, and designed to grow with the community.
Just like Iceberg unlocked the power of open tables, Polaris is unlocking open, centralized metadata and data governance, without sacrificing interoperability or control.
Whether you're processing data in Dremio, Snowflake, Spark, Flink, or any other Iceberg-compatible engine or tool, Polaris makes it possible to govern everything with a single, consistent model.
And it’s not just about technical alignment, it’s about industry alignment. Vendors, users, and contributors are rallying around Polaris to establish it as the next-generation standard for Iceberg catalogs, the same way Iceberg redefined the table format.
This is the moment where metadata becomes modern. And Polaris is leading the way.
Join the Movement
Apache Polaris is more than a new project, it’s a movement toward open, scalable, and secure data infrastructure that reflects the realities of today’s lakehouse architectures. It’s about breaking down silos, eliminating vendor lock-in, and creating a universal control plane for your data.
Dremio is proud to have co-created Polaris and to be spearheading its development in the Apache community. But the future of Polaris isn’t ours alone—it belongs to every engineer, architect, analyst, and vendor who believes in the power of open data architectures.
Whether you’re building pipelines, enforcing data policies, powering dashboards, creating AI agents, or training AI models, Polaris is designed to make it easier, safer, and more interoperable.
If you’re excited about the future of open lakehouse architecture, now is the time to get involved:
- Try Polaris in your environment.
- Contribute to the project.
- Join the conversation.
Let’s build the next standard, together.
Sign up for AI Ready Data content