The Dremio Blog

Engineering Blog

Engineering Blog

The First User of Your CLI Won’t Be a Person

Why Dremio built a command-line tool designed to be introspected by machines. When GitHub launched gh in 2020, they framed the problem as context switching: developers losing flow by bouncing between terminal and browser. When Stripe shipped their CLI, the pain was webhook testing. When Fly.io built flyctl, the argument was philosophical: web apps aren't […]

Rahim Bhojani
Engineering Blog

Accelerating Joins in Dremio with Runtime Filters

Runtime filters in Dremio are an opportunistic, runtime‑only optimization: they do not replace good data modeling, partitioning, or reflections, but they stack on top of those fundamentals to remove work that is provably useless for a specific query run.

Chris Pride
Engineering Blog

“Random Engine” Design for Dremio Software

Current Architecture (Conceptual) The current Dremio Software architecture often uses a fixed pool of executor engines. While this provides stability for baseline workloads, it struggles to handle predictable spikes in demand, leading to performance bottlenecks during peak periods, and overallocation of engines during quieter periods. Overallocation can have an impact on Cloud costs. This document’s […]

Michael Flower
Engineering Blog

Column Nullability Constraints in Dremio

Column nullability serves as a safeguard for reliable data systems. Apache Iceberg's capabilities in enforcing and evolving nullability rules are crucial for ensuring data quality. Understanding the null, along with the specifics of engine support, is essential for constructing dependable data systems.

Laszlo Pinter
Engineering Blog

Query Results Caching on Iceberg Tables

Seamless result cache for Iceberg was enabled for all Dremio Cloud organizations in May 2025. Since then, our telemetry has told us between 10% to 50% of a single project’s queries have been accelerated by result cache. That’s a huge cost saving on executors. Looking forward, Dremio is doing research on how to bring its reflection matching query re-write capabilities to the result cache. For example, once a user generates a result cache entry, it should be possible to trim, filter, sort and roll up from this result cache. Limiting the search space and efficient matching through hashes will be key features to make matching on result cache possible. Stay tuned for more!

Benny Chow
Dremio Blog: Open Data Insights

Benchmarking Framework for the Apache Iceberg Catalog, Polaris

The Polaris benchmarking framework provides a robust mechanism to validate performance, scalability, and reliability of Polaris deployments. By simulating real-world workloads, it enables administrators to identify bottlenecks, verify configurations, and ensure compliance with service-level objectives (SLOs). The framework’s flexibility allows for the creation of arbitrarily complex datasets, making it an essential tool for both development and production environments.

Pierre Laporte
Engineering Blog

Too Many Roundtrips: Metadata Overhead in the Modern Lakehouse

The traditional approach of caching table metadata and periodically refreshing has various drawbacks and limitations. With seamless metadata refresh, Dremio now provides users with an effortless experience to query the most up-to-date versions of their Iceberg tables without wrecking the performance of their queries. So now a user querying a shared table in Dremio Enterprise Catalog powered by Apache Polaris for example can see updates from an external Spark job immediately with no delay, and they never even have to think about it.

Jeremy Lapacik
Engineering Blog

Introducing Dremio Auth Manager for Apache Iceberg

Dremio Auth Manager is intended as an alternative to Iceberg’s built-in OAuth2 manager, offering greater functionality and flexibility while complying with the OAuth2 standards. Dremio Auth Manager streamlines authentication by handling token acquisition and renewal transparently, eliminating the need for users to deal with tokens directly, and avoiding failures due to token expiration.

Alex Dutra
Engineering Blog

Dremio’s Apache Iceberg Clustering: Technical Blog

Clustering is a data layout strategy that organizes rows based on the values of one or more columns, without physically splitting the dataset into separate partitions. Instead of creating distinct directory structures, like traditional partitioning does, clustering sorts and groups related rows together within the existing storage layout.

Gang Xiao
Engineering Blog

Pre-Computing Secure Materializations

Integrating row column access control with materializations enables Dremio Reflections to deliver high-performance query execution without compromising on security or flexibility, making it an ideal solution for scalable, secure data access in the lakehouse architecture. Furthermore, by enabling pre-compute materializations to be re-usable across users and roles, significant cost savings can be achieved through more efficient engine resource utilization.

James Starr
Engineering Blog

Autonomous Reflections: Technical Blog

At Dremio, we implemented Autonomous Reflections in our own internal Data Lakehouse. We are happy to report that Autonomous Reflections exceeded our expectations. In just days, we saw significant improvements

Yingyu Wang
Engineering Blog

Credential Vending with Iceberg REST Catalogs in Dremio

Credential vending support in Dremio opens up a more secure and convenient way to query external Iceberg catalogs. By obtaining temporary, table-scoped credentials on the fly, Dremio minimizes long-lived secrets and ensures access is tightly controlled by the catalog’s policies.

Adam Szita