8 minute read · July 7, 2025
Query Results Caching on Iceberg Tables

· Principal Software Engineer, Dremio

A modern data lakehouse query engine can securely access enterprise data directly from an object store without intermediary copies or proprietary data marts. The query engine has to scan, filter, join and aggregate object store data in open formats while meeting strict latency requirements to power BI dashboard workloads and ad-hoc user queries. Fundamentally, these expensive operations can’t be done in real time on billions of rows of data on every user query. Instead, query engines employ a multitude of techniques to pre-compute and cache expensive operations including planning and execution work. This technical blog will discuss how Dremio’s lakehouse query engine takes advantage of Iceberg’s table snapshots to cache execution results while also ensuring data freshness and security.
Query results cache is a query acceleration feature that will cache query results by asynchronously spooling query results in Arrow buffer format to a distributed store such as S3. Query results are indexed by a cache key that hashes the query plan plus the data version of every table scanned in the plan. When a new query is planned, the query engine will check for an existing up to date cache entry. If one exists, the physical plan is replaced with a simple Arrow buffer scan thereby turning execution into a trivial job.
Results cache works across all SQL clients including Dremio’s SQL Runner UI, JDBC, ODBC, Flight and REST APIs. It is available on Dremio Cloud and Dremio Software Enterprise Edition. As new coordinator nodes join the cluster, they immediately benefit from previously cached results. There are no additional external dependencies because Dremio uses its own KV store to track query results metadata and manages the Arrow file clean up on its own - i.e. there are no TTL policies needed in the distributed storage or KV store.
Result Cache Key and Iceberg Snapshot IDs
How does the planner generate a unique cache key for a given user’s SQL query? The cache key needs to incorporate the following components:
- Hash of the logical query plan - This includes the fully expanded query tree including views and user defined SQL functions (UDFs). If the plan contains dynamic or non-deterministic functions, the query is not cacheable. Examples of these functions include CURRENT_DATE, RAND, etc.
- Hash of the Iceberg snapshot ids and metadata file paths in the physical plan - The physical plan contains the actual scans needed to service the query after reflection matching. Reflections are based on Dremio’s query acceleration technology that transparently replaces query sub-trees with pre-computed materializations. If all the remaining scans are Iceberg, the query is cacheable. Furthermore, by hashing the snapshot ids, we are basically including the table data version into the cache key. Iceberg metadata.json file paths are also included into the cache key because table schema and partition changes do not require a snapshot id change as defined by the Iceberg Table specification.
- Hash of the query user name - For security reasons, query results cache cannot be shared by different users. If a query uses the IS_MEMBER function in row level security or column level masks, the query results become specific to a user and cannot be shared.
Complimentary with Reflections
Data sources like MySQL, MongoDB or even a simple Excel file do not provide a data version for their tables or collections. If Dremio’s query engine cached a query on top of a MySQL table, the next time the same query is run, the planner would not know if data had changed in the MySQL table. As a result, query results cache is not supported on queries that directly query from these non-versioning sources. However, if a reflection is built on these sources, these sources now become cacheable indirectly through the reflection. The underlying storage for Dremio Reflections are simply Iceberg tables registered in an internal Nessie catalog. Since reflections can provide an Iceberg snapshot id, their data versions can be incorporated into the result cache key.
What happens when the results cache becomes stale because data in MySQL changed? While the source data in the reflection’s materialization can become stale, the reflection refresh and expiration policy is controlled by the refresh policies set on the tables that feed into the reflection. For example, you can set an hourly refresh on the MySQL table so that the reflection generally updates on an hourly basis - usually incrementally. Regardless if the user runs a query on the MySQL table, before, during or after the reflection refresh, it’ll always get a cache hit or cache using the reflection’s current materialization. Once a new version of the materialization is available, the same query will stop planning with the old cache entry and produce a new one using the updated materialization. It’s basically like having a seamless cache on top of a relational cache that tradeoffs a defined window of staleness for sub-second query response time.
Query Execution Challenges
There were a few design challenges that needed to be solved on the query execution side.
During planning, a query may be eligible for caching because for example it contained only Iceberg scans and didn’t use dynamic functions, but during execution, the query could potentially return too many rows to cache. The problem was solved by buffering records up to a configurable limit (default 20MB Arrow buffer size) and asynchronously writing them to distributed storage only after all records were streamed back to the client and the total size remained within limits. This generally results in caching aggregated and filtered queries and not caching select * type queries from large/wide tables. When a query exceeds the limit, the executor will send a message to the coordinator to not attempt caching of the same query for 24 hours. Furthermore, if the same query is run simultaneously, only one of the queries will attempt to buffer its results and the other will execute normally.
Another issue with buffering records occurs with UI and REST based SQL queries that require their results to be saved to the result storage before being paged through on subsequent job results requests. With Flight, JDBC and ODBC, no results are first saved to disk and all records eventually flow through a final screen operator as long as the client keeps asking for more data. The last phase of this execution plan is always single threaded and so the executor will know exactly how many records were in the query. The problem with UI and REST based SQL queries is that the phase that saves out the job results could be parallelized (which is very important for write throughput). The solution was to broadcast the records into a new single threaded stage that could buffer and write out the cache results. This design basically enabled us to add support for result cache while still supporting parallel writes for job results. Furthermore, it unified the result cache across SQL clients so that it would be possible for REST and JDBC to share the same result caches.
Looking Forward
Seamless result cache for Iceberg was enabled for all Dremio Cloud organizations in May 2025. Since then, our telemetry has told us between 10% to 50% of a single project’s queries have been accelerated by result cache. That’s a huge cost saving on executors. Looking forward, Dremio is doing research on how to bring its reflection matching query re-write capabilities to the result cache. For example, once a user generates a result cache entry, it should be possible to trim, filter, sort and roll up from this result cache. Limiting the search space and efficient matching through hashes will be key features to make matching on result cache possible. Stay tuned for more!
Sign up for AI Ready Data content