Iceberg 1.9 includes the Auth Manager API, a new API enabling pluggable authentication for Iceberg REST. This enhancement allows for the integration of various authentication protocols and expanded OAuth2 functionalities.
Building on this API, Dremio has just open-sourced Dremio Auth Manager, a new OAuth2 manager for Apache Iceberg REST catalogs. Originally, this project was created due to limitations encountered while integrating Iceberg REST into our products. The implementation is now being open-sourced for the broader community.
Dremio Auth Manager is intended as an alternative to Iceberg’s built-in OAuth2 manager, offering greater functionality and flexibility while complying with the OAuth2 standards. Dremio Auth Manager streamlines authentication by handling token acquisition and renewal transparently, eliminating the need for users to deal with tokens directly, and avoiding failures due to token expiration.
One last caveat: all examples below have numbers pinpointing some aspects of interest in the code, e.g. ①②③. Please make sure to remove those numbers if you execute the commands in your terminal.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Dremio Auth Manager with Apache Polaris (incubating)
Our first example will use Apache Polaris (incubating) to illustrate how Dremio Auth Manager can seamlessly replace Iceberg’s built-in manager, facilitating migration from the latter to the former.
Add the Dremio Auth Manager runtime jar to the Spark classpath
Enable the Dremio Auth Manager with the rest.auth.type property
Enable the “Iceberg REST” dialect
Provide the client credentials
Activate all roles for the authenticated principal
In this example Apache Polaris (incubating) acts as the identity provider, and since Dremio Auth Manager is configured with the “Iceberg REST” dialect, it will use Polaris’ internal token endpoint to authenticate, effectively behaving like Iceberg’s built-in manager.
At the prompt, type use polaris; to activate the catalog and trigger the authentication; then you are ready to go!
Dremio Auth Manager with Nessie
Our second example demonstrates using Dremio Auth Manager to authenticate against a Nessie catalog server, leveraging Keycloak as an external identity provider.
Unlike the initial scenario, in this example Dremio Auth Manager will adhere strictly to OAuth2 standards for Keycloak interactions, a key advantage over Iceberg’s built-in manager. This ensures automatic background token refreshing, addressing a known limitation of the built-in manager (which often fails to refresh tokens with external identity providers due to its non-compliance with OAuth2 standards).
To further illustrate other interesting features, we will utilize the “Authorization Code” grant type, a prevalent OAuth2 flow, not natively supported by the built-in Iceberg manager.
The identity provider (issuer) URL; for Keycloak, this is the realm root URL
Enable the “Authorization Code” grant
Provide the client credentials
The scopes to activate will depend on how Keycloak is configured (you may need to also request offline_access)
Note how the configuration includes the Keycloak realm root URL. Dremio Auth Manager uses this URL to automatically retrieve the OpenID Provider metadata from Keycloak and to identify necessary endpoints, such as the token and authorization endpoints, which simplifies configuration.
At the prompt, type use nessie; to activate the catalog. You should see an output like below:
[iceberg-auth-manager] ======== Authentication Required ======== [iceberg-auth-manager] Please open the following URL to continue: [iceberg-auth-manager] http://127.0.0.1:8080/realms/iceberg/protocol/openid-connect/...
Authenticate with Keycloak by clicking the provided link and using the username “Alice” and password “s3cr3t”. Upon successful authentication, return to the Spark SQL shell to resume normal operations. You won’t need to do this again if the access token expires: Dremio Auth Manager will refresh it for you in the background!
Dremio Auth Manager with Dremio Enterprise Catalog
To interact with a Dremio Enterprise Catalog using Spark SQL, you need a Dremio Kubernetes cluster running version 26 or later with the Dremio Enterprise Catalog feature activated.
As a prerequisite, it is assumed that you have already generated a Dremio Personal Access Token (PAT) for the user account that Spark will use to connect to the Dremio cluster.
Make sure you have saved your PAT in the DREMIO_PAT variable.
Store the Dremio cluster IP address in the DREMIO_ADDRESS variable.
Then run the following command to start the Spark SQL shell:
Provide your Dremio PAT and Dremio cluster IP address
The warehouse must be default
The address of the Dremio Enterprise Catalog service
The address of the Dremio cluster token endpoint
Enable the Token Exchange grant
Provide the client ID (no client secret required)
The scope must be dremio.all
Configure the subject token to use your Dremio PAT
In this demonstration, the Dremio cluster acts as the identity provider. The client exchanges the user’s Personal Access Token (PAT) for an access token issued by Dremio, then uses it to authenticate with Dremio Enterprise Catalog. This process illustrates one of the many ways to use the “Token Exchange” grant type.
Conclusion
The preceding examples offer a brief overview of Dremio Auth Manager’s capabilities. But Dremio Auth Manager offers many more features such as:
Multiple token exchange possibilities, such as impersonating an externally managed identity.
And more features are coming, such as client JWT assertions. Stay updated on the GitHub repository for the latest additions and feel free to submit issues for feedback, feature suggestions, or bug reports!
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Column nullability serves as a safeguard for reliable data systems. Apache Iceberg's capabilities in enforcing and evolving nullability rules are crucial for ensuring data quality. Understanding the null, along with the specifics of engine support, is essential for constructing dependable data systems.
Jul 1, 2025·Dremio Blog: Open Data Insights
Benchmarking Framework for the Apache Iceberg Catalog, Polaris
The Polaris benchmarking framework provides a robust mechanism to validate performance, scalability, and reliability of Polaris deployments. By simulating real-world workloads, it enables administrators to identify bottlenecks, verify configurations, and ensure compliance with service-level objectives (SLOs). The framework’s flexibility allows for the creation of arbitrarily complex datasets, making it an essential tool for both development and production environments.
Apr 28, 2025·Engineering Blog
Dremio’s Apache Iceberg Clustering: Technical Blog
Clustering is a data layout strategy that organizes rows based on the values of one or more columns, without physically splitting the dataset into separate partitions. Instead of creating distinct directory structures, like traditional partitioning does, clustering sorts and groups related rows together within the existing storage layout.