Dremio Blog

36 minute read · May 27, 2026

Apache Polaris 1.5.0: Deep-Dive Into the Future of Open Data Catalogs

Alex Merced Head of DevRel, Dremio

Start For Free

Copied to clipboard

Apache Polaris 1.5.0: Deep-Dive Into the Future of Open Data Catalogs

Why Polaris Matters: The Core Philosophy

1. Security & Access Control Hardening

2. Metastore Federation & Catalog Uniformity

3. Advanced Credential Vending & Access Delegation

4. CLI & Admin User Experience Upgrades

5. Performance, Concurrency & Persistence

6. Governance & Community Standards

Technical Summary of Key Pull Requests

Apache Polaris and Dremio: The Open Catalog Integration

Technical Configuration Guides for Multi-Engine Setup

7. Open REST Catalogs vs. Proprietary Lock-In

Troubleshooting Polaris 1.5.0 Deployments

Evolving Your Data Architecture

Catalog governance is the biggest bottleneck in building a multi-engine lakehouse. When you query the same Apache Iceberg tables with Spark, Flink, and Dremio, synchronizing permissions and access credentials across different engines is traditionally a manual, error-prone chore.

Apache Polaris solves this by providing a centralized, open-source REST catalog for Apache Iceberg tables. Instead of duplicating access control policies across every query engine, you manage them once in Polaris.

The release of Apache Polaris 1.5.0 marks a significant step forward in the project's evolution. This release introduces enterprise-grade security integrations, expanded catalog federation, advanced credential vending, and key performance optimizations.

This deep dive examines the pull requests, code changes, and architectural updates in Polaris 1.5.0, and explains what they mean for your data operations.

Why Polaris Matters: The Core Philosophy

Data lakehouses succeed when they separate compute from storage. If your data is stored in open formats like Apache Iceberg, you should not be locked into a single query engine. You might use Apache Spark for batch ETL, Apache Flink for real-time streaming, and Dremio for interactive business intelligence.

However, sharing storage requires sharing metadata. Without a shared catalog, different engines cannot agree on table schemas, partitions, or transaction state.

The Iceberg REST Catalog specification defines how engines communicate with a catalog service. Apache Polaris is a reference implementation of this spec, but it goes further by managing access delegation and credential vending.

In Dremio's ecosystem, Polaris serves as the open-source foundation for the Dremio Open Catalog. One Dremio Organization maps to a Polaris Realm, and individual Dremio Projects map to Polaris Catalogs. By building on Polaris, Dremio ensures your metadata remains in an open, standardized catalog, preventing vendor lock-in.

1. Security & Access Control Hardening

Enterprise data lakes require strict, auditable access controls. Polaris 1.5.0 introduces architectural changes to enforce security policies cleanly and integrate with existing enterprise policy engines.

Integrating Apache Ranger for Enterprise Authorization (PR #3928)

For years, Apache Ranger has been the standard for managing data security policies in enterprise environments. Many organizations still maintain extensive Ranger policies that control access to Hadoop, Hive, or Trino.

Pull request #3928, contributed by @sneethiraj, implements the Apache Ranger Authorization Plugin for Apache Polaris. This allows Polaris to delegate authorization decisions directly to an external Ranger service.

When a query engine requests schema access or table metadata from Polaris, the Polaris catalog server does not just evaluate its internal RBAC rules. Instead, it queries the Ranger service provider interface to check whether the requesting user has the appropriate privileges.

// Simplified illustration of the Ranger Authorizer integration
public class RangerPolarisAuthorizer implements PolarisAuthorizer {
    private final RangerBasePlugin rangerPlugin;

    public RangerPolarisAuthorizer(RangerBasePlugin plugin) {
        this.rangerPlugin = plugin;
    }

    @Override
    public AuthorizationResponse authorize(PolarisAuthorizableContext context) {
        RangerAccessRequest request = createRangerRequest(context);
        RangerAccessResult result = rangerPlugin.isAccessAllowed(request);
        
        if (result != null && result.getIsAllowed()) {
            return AuthorizationResponse.allow();
        }
        return AuthorizationResponse.deny("Access denied by Apache Ranger policy.");
    }

    private RangerAccessRequest createRangerRequest(PolarisAuthorizableContext context) {
        RangerAccessRequestImpl request = new RangerAccessRequestImpl();
        request.setResource(new RangerPolarisResource(context.getTargetEntity()));
        request.setUser(context.getPrincipalName());
        request.setAccessType(context.getRequiredAction().name());
        request.setAction(context.getRequiredAction().name());
        request.setRequestData(context.getRequestPayload());
        return request;
    }
}

Ranger policy administrators can define rules using familiar resources. They map policies to Polaris realms, catalogs, namespaces, and tables.

Here is an example of an Apache Ranger policy definition in JSON, showing how permissions are mapped:

{
  "id": 105,
  "name": "polaris-finance-read-policy",
  "service": "polaris-prod-service",
  "resources": {
    "realm": { "values": ["prod_realm"] },
    "catalog": { "values": ["finance_catalog"] },
    "namespace": { "values": ["tax_records", "payroll"] },
    "table": { "values": ["*"] }
  },
  "policyItems": [
    {
      "accesses": [
        { "type": "READ", "isAllowed": true },
        { "type": "LIST_TABLES", "isAllowed": true }
      ],
      "users": ["finance_analyst_01"],
      "groups": ["finance_readers"],
      "delegateAdmin": false
    }
  ]
}

This integration allows you to centralize policy management. You do not need to rewrite your security policies when migrating from legacy Hadoop environments to a modern cloud-native Iceberg lakehouse. Your existing Ranger policies apply directly to your Iceberg tables.

Decoupling Permissions from Internals (PR #4006)

In earlier versions of Polaris, authorization logic was tightly coupled to the internal PolarisAuthorizableOperations enum. Every permission check was directly bound to a specific Java enum value representing a catalog operation.

Pull request #4006, contributed by @sungwy, decouples RBAC privileges and operation semantics from these internal enums. This refactoring introduces a cleaner separation between what an operation does and what permission it requires.

By isolating the authorization checks from the catalog operation logic, the codebase becomes more modular. Developers can write custom authorization plugins, such as the Ranger plugin or an Open Policy Agent (OPA) authorizer, without modifying the core catalog service codebase.

Consider how this looks in practice for OPA rule declarations. With the decoupled architecture, OPA can query semantic actions directly. Here is an example of OPA Rego rules checking table actions in Polaris:

package polaris.authz

default allow = false

# Allow read access to tables if user belongs to reader group
allow {
    input.action == "READ_TABLE"
    input.user_groups[_] == "data_analysts"
    input.resource.catalog == "shared_catalog"
}

# Restrict table deletion to administrator group
allow {
    input.action == "DROP_TABLE"
    input.user_groups[_] == "platform_admins"
}

This decoupled approach reduces system maintenance costs, prevents code regressions, and makes auditing security rules much simpler.

Cleaning Up the Grant Lifecycle (PR #4059, #4234)

When principal roles or users are dropped from a security system, orphaned grant records can remain in the metadata database. These "phantom grants" pose a security risk and clutter the catalog's persistence layer.

Polaris 1.5.0 addresses this with pull requests #4059 and #4234. These updates refactor how grant operations are verified and stored. Polaris now automatically filters out stale grants associated with deleted grantees whenever it loads or resolves authorization entities.

Before applying a new grant or revoking an existing privilege, the catalog service revalidates the target entities. If a role or principal no longer exists, its associated grants are cleaned up, ensuring the authorization state is clean and secure.

2. Metastore Federation & Catalog Uniformity

Data teams rarely start with a clean slate. They often have historical datasets stored in Hive Metastores or managed by cloud services like Google Cloud BigQuery. Polaris 1.5.0 introduces federation capabilities to help unify these disparate metadata sources.

Google Cloud BigQuery Metastore Federation (PR #4050)

Google Cloud Platform users often rely on the BigQuery Metastore (an HMS-compatible service) to manage tables. Pull request #4050, contributed by @joyhaldar, adds Google Cloud BigQuery Metastore federation support to Polaris.

This feature allows Polaris to federate tables from BigQuery. Polaris reads the schema and location details from the BigQuery Metastore and projects them as standard Iceberg REST endpoints.

Query engines can join tables in Google Cloud Storage with tables in Google BigQuery through a single Polaris catalog. This eliminates the need to copy metadata or synchronize schemas manually between GCP services and your lakehouse catalog.

To configure BigQuery Metastore federation in Polaris, you set up a federated catalog using the following configuration model:

{
  "name": "gcp_federated_catalog",
  "type": "FEDERATED",
  "properties": {
    "catalog-impl": "org.apache.polaris.catalog.gcp.BigQueryMetastoreCatalog",
    "gcp.project-id": "prod-data-lake-102",
    "gcp.metastore.uri": "thrift://metastore.us-central1.gke.gcp.internal:9083",
    "warehouse": "gs://prod-lakehouse-bucket-01/warehouse/"
  }
}

Once configured, Polaris acts as a bridge, parsing BigQuery datasets and projecting them as Iceberg schemas.

Hive Metastore Federation and Migration (PR #4315)

Migrating away from legacy Hadoop architectures is a priority for many organizations, but rewriting millions of files is not feasible. Polaris 1.5.0 improves support for Hive Metastore (HMS) federation, validated by new integration tests in pull request #4315.

Polaris federates legacy Hive tables by reading their metadata and presenting them as readable endpoints to modern engines. This allows you to adopt a gradual migration strategy. You can keep your historical data in place while writing new datasets as native Apache Iceberg tables, accessing both through the same Polaris interface.

A typical Spark configuration pointing to the federated Hive catalog inside Polaris looks like this:

# Spark configuration properties for Hive Metastore federation
spark.sql.catalog.polaris_hms = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.polaris_hms.type = rest
spark.sql.catalog.polaris_hms.uri = http://polaris-server:8181/api/v1
spark.sql.catalog.polaris_hms.warehouse = hms_federated_catalog
spark.sql.catalog.polaris_hms.header.Realm = prod_realm

This configuration allows Spark to query the federated HMS schemas as if they were native Iceberg tables.

Terminology Uniformity: Federated Catalog Factory (PR #4116)

As federation became a core feature, the naming convention in the codebase became inconsistent. Some components referred to federated sources as "External Catalogs," while others used "Federated Catalogs."

Pull request #4116, contributed by @flyrain, renames ExternalCatalogFactory to FederatedCatalogFactory throughout the codebase.

This change matches Dremio's terminology system. In the Agentic Lakehouse, a catalog is not just a local metadata store. It is a federated gateway that connects to databases, warehouses, and other catalogs. The Dremio Open Catalog combines a Polaris catalog with federated sources, exposing a single, governed namespace.

3. Advanced Credential Vending & Access Delegation

Credential vending is a key feature of Apache Polaris. In traditional data architectures, you had to distribute cloud storage credentials (like AWS IAM keys) to every client machine running a query. This created a large security risk.

Polaris eliminates this risk by using access delegation. Query engines do not have direct access to cloud storage. Instead, they authenticate with Polaris.

When an engine requests a table read or write, Polaris verifies the user's permissions, requests short-lived, limited-privilege tokens from the cloud provider, and sends them back to the engine. The engine uses these temporary credentials to read or write the Parquet data files.

Regional STS Client Configurations (PR #4161)

In global, multi-region cloud deployments, physical distance introduces latency and connection failures. When vending credentials for S3 buckets, Polaris requests temporary tokens from the AWS Security Token Service (STS).

In previous versions, the internal STS client did not always configure its regional endpoints dynamically. An executor in one region might try to fetch tokens from a distant global STS endpoint, causing network timeouts or region-mismatch errors.

Pull request #4161, contributed by @yushesp, fixes this by passing the signing region parameter to the STS client builder.

// Passing region explicitly to the STS client builder
StsClient stsClient = StsClient.builder()
    .region(Region.of(configuredSigningRegion))
    .credentialsProvider(DefaultCredentialsProvider.create())
    .build();

By explicitly configuring the regional STS client, Polaris ensures token requests are routed to the nearest regional endpoint. This prevents query failures caused by network latency or regional policy restrictions in multi-region deployments.

Single Expiration Timestamp per Bundle (PR #4173)

A table write operation in Apache Iceberg can involve writing hundreds of data files and manifest files across different storage locations. Previously, Polaris could return a bundle of vended credentials where different storage paths had different token expiration times.

If some credentials expired before others during a long-running write job, the query engine would experience path-specific write failures, leaving the table in an inconsistent state.

Pull request #4173, contributed by @yushesp, addresses this by enforcing a single, unified expiration timestamp across the entire credential bundle.

This update simplifies token cache management. Query engines like Spark or Dremio can track a single time-to-live (TTL) for the vended credential package, refreshing all tokens at once before any write path expires.

Here is an example structure of a vended credentials payload from Polaris, showing the unified expiration field:

{
  "tokenType": "AWS_IAM_ROLE",
  "expiration": "2026-05-28T14:45:00Z",
  "credentials": [
    {
      "path": "s3://prod-bucket-01/warehouse/db/table/data/",
      "accessKeyId": "ASIA4SDFGHJKLOPIUY",
      "secretAccessKey": "qwert12345/yuiop67890/zxcvbnm",
      "sessionToken": "IQoJb3JpZ2luX2VjEGM...[TRUNCATED]"
    },
    {
      "path": "s3://prod-bucket-01/warehouse/db/table/metadata/",
      "accessKeyId": "ASIA4SDFGHJKLOPIUY",
      "secretAccessKey": "qwert12345/yuiop67890/zxcvbnm",
      "sessionToken": "IQoJb3JpZ2luX2VjEGM...[TRUNCATED]"
    }
  ]
}

This structural change guarantees that all analytical query tasks running inside executors complete their writes using consistent session keys.

KMS Credential Mocking and Generic Table API (PR #4034, #4043)

Polaris 1.5.0 improves support for local testing and developer environments. Pull request #4034 adds support for AWS-shaped Key Management Service (KMS) credentials when using local storage mocks like MinIO. This allows developers to test encrypted credential vending locally before deploying to production.

Additionally, pull request #4043 introduces the Polaris-Generic-Table-Access-Delegation header in the Generic Table API. This allows client applications to request specific token delegation scopes when accessing non-Iceberg metadata formats managed by Polaris.

4. CLI & Admin User Experience Upgrades

Managing a catalog programmatically is essential for platform teams. Polaris 1.5.0 updates the Python-based CLI client to make administration easier.

The New `summarize` Subcommand (PR #4003)

Platform administrators need to know the state of their catalogs. How many tables exist? What is the size distribution?

Pull request #4003, contributed by @MonkeyCanCode, adds the summarize subcommand to the Polaris CLI.

# Example command using the new summarize subcommand
polaris catalogs summarize --name my_catalog

Executing this command returns a JSON summary of the catalog's structure, including counts of tables, namespaces, and views, along with basic storage details. This provides a quick way to audit catalog use without scraping logs or writing custom SQL scripts.

Locating Tables Across Deep Namespaces (PR #4075)

In large organizations, tables are organized into nested namespaces representing different business units, teams, and environments. Finding a specific table in a deep directory structure is difficult.

Pull request #4075, also contributed by @MonkeyCanCode, introduces the tables/find command to search for tables.

# Locating tables across the entire namespace tree
polaris tables find --name web_logs

This command searches the entire namespace hierarchy and returns a table listing matches, their schema details, and parent namespaces. This makes it easier to locate datasets and verify catalog structure.

5. Performance, Concurrency & Persistence

Under heavy analytical workloads, a catalog must handle thousands of concurrent metadata requests. Latency in the catalog translates to latency in query planning. Polaris 1.5.0 optimizes the storage engines and persistence layers.

Per-Realm JDBC Locking (PR #4054)

Many Polaris deployments use relational databases (like PostgreSQL) via JDBC to persist catalog metadata. In previous versions, the JDBC store used coarse-grained synchronized methods to coordinate updates.

This created a concurrency bottleneck. A write operation in one tenant realm would block metadata reads in another realm, causing query planning delays in multi-tenant environments.

Pull request #4054, contributed by @singhpk234, replaces these coarse-grained synchronized blocks with per-realm database locks.

// Concept diagram of per-realm locking optimization
public class PolarisJdbcStore {
    private final ConcurrentMap<String, ReentrantLock> realmLocks = new ConcurrentHashMap<>();

    public void updateEntity(String realm, EntityMetadata entity) {
        ReentrantLock lock = realmLocks.computeIfAbsent(realm, r -> new ReentrantLock());
        lock.lock();
        try {
            // Execute database write operation for this specific realm
            executeDbWrite(realm, entity);
        } finally {
            lock.unlock();
        }
    }
}

This ensures write operations in one realm do not block database access in other realms. Multi-tenant deployments experience lower latency and higher throughput, especially during peak load times.

NoSQL Persistence and GC Reduction (PR #4071)

For NoSQL persistence backends (like Apache Cassandra), database write speed is often limited by Java Garbage Collection (GC) pauses. Frequently allocating and discarding temporary byte arrays during serialization causes high memory pressure.

Pull request #4071, contributed by @snazy, optimizes the NoSQL persistence layer by reusing serialization scratch buffers.

Instead of allocating a new byte array for every entity write, Polaris maintains a pool of reusable byte buffers. The serialization engine writes fields directly into these buffers, reducing allocations and garbage collection overhead.

6. Governance & Community Standards

As the Polaris community grows, maintaining code quality and compliance is essential. Polaris 1.5.0 introduces guidelines to govern contributions, including those generated by AI tools.

AI-Generated Contribution Guidelines (PR #3948, #4276)

The rise of AI coding assistants has led to an increase in automated pull requests. While these tools can improve developer productivity, they can also introduce low-quality code, security vulnerabilities, or license compliance issues.

Polaris 1.5.0 addresses this with pull requests #3948 and #4276. These updates introduce contribution guidelines for AI-generated code, documented in AGENTS.md.

These guidelines define requirements for contributions generated by AI agents. Code submitted by AI must meet checkstyle standards, include unit tests, and avoid introducing deprecated methods. This ensures the project remains secure and maintainable.

Technical Summary of Key Pull Requests

The following table summarizes the key pull requests in Apache Polaris 1.5.0:

Category	PR Number	Contributor	Core Change	Primary User Impact
Security	#3928	@sneethiraj	Integrates Apache Ranger Authorization Plugin	Unified enterprise security policy management.
Security	#4006	@sungwy	Decouples privileges from internal operation enums	Modular and extensible security codebase.
Security	#4059	@flyrain	Refactors grant validation operations	Clean grant lifecycle; no phantom access records.
Federation	#4050	@joyhaldar	Adds Google BigQuery Metastore federation	Query GCP and object storage tables together.
Federation	#4116	@flyrain	Renames External Catalog to Federated Catalog	Terminology alignment with modern query federation.
Access Vending	#4161	@yushesp	Passes region parameter to STS client builder	Regional STS routing; fixes multi-region latency.
Access Vending	#4173	@yushesp	Enforces single expiration timestamp per bundle	Simplified cache management; reduces write failures.
CLI	#4003	@MonkeyCanCode	Adds `summarize` subcommand	Easy programmatic catalog audits.
CLI	#4075	@MonkeyCanCode	Adds `tables/find` search command	Improved dataset discoverability across deep namespaces.
Performance	#4054	@singhpk234	Implements per-realm JDBC database locks	Eliminates tenant contention in JDBC multi-tenant setups.
Performance	#4071	@snazy	Reuses NoSQL serialization scratch buffers	Reduced GC overhead; lower query planning latency.

Apache Polaris and Dremio: The Open Catalog Integration

The performance and security updates in Polaris 1.5.0 directly benefit users of the Dremio Open Catalog.

Because Dremio is built on open standards, it uses Polaris to manage Iceberg table metadata. When Polaris optimizes its JDBC persistence layer or improves its credential vending logic, Dremio users experience faster query planning and more reliable token exchanges.

By combining Polaris with Dremio's query execution engine, you can build an Agentic Lakehouse that is fast, secure, and open. Dremio provides sub-second query execution on Iceberg tables via automated features like reflections and cloud caching, while Polaris manages the shared metadata layer.

This architecture ensures you retain control of your data. Your tables are stored in open Apache Iceberg format in your own cloud storage, cataloged by an open-source REST service, and accessible by Spark, Flink, and Dremio without vendor lock-in.

This open approach enables complete freedom of choice. You can run Spark jobs, Flink streaming pipelines, and Dremio BI dashboards in parallel, all querying the same physical storage bucket. The Polaris catalog coordinates the transactions, while Dremio accelerates queries, eliminating the serialization tax.

Technical Configuration Guides for Multi-Engine Setup

To fully realize the value of Polaris 1.5.0, you must configure your query engines to communicate with the catalog server. Below are the technical configuration templates for Spark, Flink, and Trino.

Apache Spark Session Configuration

For data pipeline jobs running on Spark, add these parameters to your Spark configuration payload:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PolarisMultiEngineSpark") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.polaris", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.polaris.type", "rest") \
    .config("spark.sql.catalog.polaris.uri", "http://polaris-server:8181/api/v1") \
    .config("spark.sql.catalog.polaris.warehouse", "main_warehouse") \
    .config("spark.sql.catalog.polaris.header.Realm", "production_realm") \
    .config("spark.sql.catalog.polaris.credentials", "client_id_finance_read:secret_key_abc123") \
    .config("spark.sql.catalog.polaris.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
    .getOrCreate()

Apache Flink SQL Catalog Configuration

For real-time stream ingestion using Flink, define the REST catalog using Flink SQL CLI or your pipeline initialization scripts:

CREATE CATALOG polaris WITH (
  'type'='iceberg',
  'catalog-impl'='org.apache.iceberg.rest.RESTCatalog',
  'uri'='http://polaris-server:8181/api/v1',
  'warehouse'='main_warehouse',
  'realm'='production_realm',
  'credential'='client_id_finance_write:secret_key_xyz789',
  'io-impl'='org.apache.iceberg.aws.s3.S3FileIO'
);

USE CATALOG polaris;

Trino Catalog Properties file

To allow interactive querying via Trino, create a catalog properties file (etc/catalog/polaris.properties):

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://polaris-server:8181/api/v1
iceberg.rest-catalog.warehouse=main_warehouse
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential-id=client_id_trino_query
iceberg.rest-catalog.oauth2.credential-secret=secret_key_trino_1234
iceberg.rest-catalog.headers=Realm:production_realm

These configuration templates establish the metadata exchange path. This path enables different analytical tools to operate against the same unified, Ranger-secured catalog tables.

7. Open REST Catalogs vs. Proprietary Lock-In

Data platform strategies often diverge on catalog architecture. Organizations must decide whether to use an open REST catalog like Apache Polaris or a proprietary metadata system. The table below compares the architectural differences between Apache Polaris, Databricks Unity Catalog, and Snowflake Horizon.

Feature	Apache Polaris	Databricks Unity Catalog	Snowflake Horizon
Catalog Spec	Native Apache Iceberg REST API	Proprietary REST API	Proprietary REST API
Vendor Neutrality	High (open source ASF project)	Medium (managed by Databricks)	Low (tied to Snowflake platform)
Multi-Engine Support	Universal (Spark, Flink, Dremio, Trino)	Primary integration with Spark	Primary integration with Snowflake
Access Control	Ranger, OPA, internal RBAC	Unity SQL privileges, catalog ACLs	Snowflake SQL grants, row/column masking
Vended Credentials	Standardized token exchange (S3/GCS/ADLS)	Internal storage credential mounting	Snowflake-managed external tables
Deployment Model	Self-hosted, Kubernetes, or Dremio Open Catalog	Managed cloud service or hosted server	Managed Snowflake platform

Proprietary metadata catalogs partition your lakehouse. If you write tables through one vendor's catalog, another vendor's query engine must jump through translation hoops to read them. This introduces the serialization tax and limits query speeds.

By utilizing Apache Polaris 1.5.0, your catalog functions as a neutral server. You write access rules in Apache Ranger once, and those rules apply uniformly whether Dremio queries the table or Spark processes a batch job. This is the cornerstone of the Agentic Lakehouse: centralizing governance while preserving absolute freedom of engine choice.

Troubleshooting Polaris 1.5.0 Deployments

When upgrading to Polaris 1.5.0, you may encounter configuration errors in multi-region environments or multi-tenant database clusters. Below are three common troubleshooting strategies.

1. Resolving AWS STS Signature Region Errors

Symptom: Client engines report errors like SignatureDoesNotMatch or ExpiredToken during query planning.

Reason: In multi-region deployments, the internal STS client might route token requests to the default global endpoint (us-east-1), while the target S3 bucket resides in a region that requires regional STS endpoints (such as eu-central-1).

Fix: Ensure you pass the signing region property to your Polaris catalog configuration parameters. Update your database configuration properties table:

UPDATE polaris_properties 
SET value = 'eu-central-1' 
WHERE key = 'aws.sts.signing-region' AND catalog_name = 'main_warehouse';

Verify that your client engines also set their S3 region parameters explicitly in their configuration properties.

2. Tuning JDBC Connection Pool for Per-Realm Locks

Symptom: Concurrency performance drops when running high numbers of parallel catalog transactions, accompanied by connection timeouts.

Reason: While per-realm JDBC locking prevents table-level contention, it increases the total number of parallel active threads requesting database connections. If your database connection pool is too small, threads wait for available connections, causing timeouts.

Fix: Increase the Hikari connection pool size (maximum-pool-size) in your Quarkus server properties file (application.properties):

# Tuning Quarkus JDBC pool sizes
quarkus.datasource.jdbc.max-size=64
quarkus.datasource.jdbc.min-size=8
quarkus.datasource.jdbc.idle-timeout=10

Ensure your target PostgreSQL or MySQL server is configured to allow a maximum connection limit that accommodates the total connections across all Quarkus nodes.

3. Cleaning Up Phantom Grants Left by Deleted Principal Roles

Symptom: Metadata audits show grant records pointing to non-existent principals or roles.

Reason: If a principal role was deleted in previous versions without running proper revoke calls first, the access records remained in the relational database.

Fix: Polaris 1.5.0 automatically filters these records, but you can clean them up from the database to improve lookups. Run a database query to audit and delete orphaned grant records:

-- Audit orphaned grants
SELECT * FROM polaris_grants 
WHERE grantee_id NOT IN (SELECT id FROM polaris_principals);

-- Clean up orphaned records
DELETE FROM polaris_grants 
WHERE grantee_id NOT IN (SELECT id FROM polaris_principals);

Regular audits ensure that your metadata storage stays optimized and secure.

Evolving Your Data Architecture

Modern analytics architectures succeed when they are open. If your table catalog is tied to a single proprietary vendor, you will face high data movement costs and complex integration pipelines.

Adopting an open-source, standardized catalog service like Apache Polaris gives you complete architectural choice. You can run Spark jobs for ETL, Flink streaming applications for real-time monitoring, and Dremio BI dashboards in parallel, all querying the same physical storage bucket.

The security updates in Polaris 1.5.0, such as Apache Ranger policy integration and JDBC per-realm locking, ensure you can scale your data operations safely.

Try Dremio Cloud free for 30 days to deploy a managed open catalog directly on your cloud data lake. You can start small, accelerate your slow dashboards, and build a modular, high-performance data lakehouse without vendor lock-in.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Open Data Insights

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Aug 16, 2023 Dremio Blog: News Highlights

5 Use Cases for the Dremio Lakehouse

With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.

Alex Merced

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

Apache Polaris 1.5.0: Deep-Dive Into the Future of Open Data Catalogs

Table of Contents

Why Polaris Matters: The Core Philosophy

1. Security & Access Control Hardening

Integrating Apache Ranger for Enterprise Authorization (PR #3928)

Decoupling Permissions from Internals (PR #4006)

Cleaning Up the Grant Lifecycle (PR #4059, #4234)

2. Metastore Federation & Catalog Uniformity

Google Cloud BigQuery Metastore Federation (PR #4050)

Hive Metastore Federation and Migration (PR #4315)

Terminology Uniformity: Federated Catalog Factory (PR #4116)

3. Advanced Credential Vending & Access Delegation

Regional STS Client Configurations (PR #4161)

Single Expiration Timestamp per Bundle (PR #4173)

KMS Credential Mocking and Generic Table API (PR #4034, #4043)

4. CLI & Admin User Experience Upgrades

The New `summarize` Subcommand (PR #4003)

Locating Tables Across Deep Namespaces (PR #4075)

5. Performance, Concurrency & Persistence

Per-Realm JDBC Locking (PR #4054)

NoSQL Persistence and GC Reduction (PR #4071)

6. Governance & Community Standards

AI-Generated Contribution Guidelines (PR #3948, #4276)

Technical Summary of Key Pull Requests

Apache Polaris and Dremio: The Open Catalog Integration

Technical Configuration Guides for Multi-Engine Setup

Apache Spark Session Configuration

Apache Flink SQL Catalog Configuration

Trino Catalog Properties file

7. Open REST Catalogs vs. Proprietary Lock-In

Troubleshooting Polaris 1.5.0 Deployments

1. Resolving AWS STS Signature Region Errors

2. Tuning JDBC Connection Pool for Per-Realm Locks

3. Cleaning Up Phantom Grants Left by Deleted Principal Roles

Evolving Your Data Architecture

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

Why Polaris Matters: The Core Philosophy

1. Security & Access Control Hardening

Integrating Apache Ranger for Enterprise Authorization (PR #3928)

Decoupling Permissions from Internals (PR #4006)

Cleaning Up the Grant Lifecycle (PR #4059, #4234)

2. Metastore Federation & Catalog Uniformity

Google Cloud BigQuery Metastore Federation (PR #4050)

Hive Metastore Federation and Migration (PR #4315)

Terminology Uniformity: Federated Catalog Factory (PR #4116)

3. Advanced Credential Vending & Access Delegation

Regional STS Client Configurations (PR #4161)

Single Expiration Timestamp per Bundle (PR #4173)

KMS Credential Mocking and Generic Table API (PR #4034, #4043)

4. CLI & Admin User Experience Upgrades

The New summarize Subcommand (PR #4003)

Locating Tables Across Deep Namespaces (PR #4075)

5. Performance, Concurrency & Persistence

Per-Realm JDBC Locking (PR #4054)

NoSQL Persistence and GC Reduction (PR #4071)

6. Governance & Community Standards

AI-Generated Contribution Guidelines (PR #3948, #4276)

Technical Summary of Key Pull Requests

Apache Polaris and Dremio: The Open Catalog Integration

Technical Configuration Guides for Multi-Engine Setup

Apache Spark Session Configuration

Apache Flink SQL Catalog Configuration

Trino Catalog Properties file

7. Open REST Catalogs vs. Proprietary Lock-In

Troubleshooting Polaris 1.5.0 Deployments

1. Resolving AWS STS Signature Region Errors

2. Tuning JDBC Connection Pool for Per-Realm Locks

3. Cleaning Up Phantom Grants Left by Deleted Principal Roles

Evolving Your Data Architecture

Try Dremio Cloud free for 30 days

Related Dremio Articles

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

5 Use Cases for the Dremio Lakehouse

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Ready to Get Started?

The New `summarize` Subcommand (PR #4003)