Dremio Blog

23 minute read · May 18, 2026

How Dremio Keeps Agentic Analytics Fast Without Manual Tuning

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
How Dremio Keeps Agentic Analytics Fast Without Manual Tuning
Copied to clipboard

A BI analyst runs the same sales dashboard every Monday morning. A data engineer can look at that query, understand the access pattern, and build a materialized view to make it fast. That model works because the query patterns are predictable and stable.

An AI agent doesn't work that way.

When a business analyst asks an AI agent "What drove the spike in product returns last month?", the agent constructs a query. Maybe it joins the orders table to the returns table to the product catalog to the regional lookup. Maybe it filters on a time range, groups by a combination of dimensions, and aggregates a metric. That specific combination of joins, filters, and aggregations may have never been run before. The agent didn't announce its intentions. It just ran the query.

This is the fundamental challenge of agentic analytics performance: AI agents generate unpredictable SQL. Traditional performance tuning: which relies on known query patterns to inform which materializations to build: breaks down completely in this environment. You can't pre-optimize for queries you can't anticipate.

Dremio solves this with a layered autonomous performance architecture. Each layer adapts to actual query patterns without human intervention. The lakehouse doesn't require a performance tuning sprint every time query patterns shift. It observes, learns, and adjusts automatically.

This post walks through each layer, what it does autonomously, what it does when you choose to intervene, and why this architecture is specifically suited to supporting AI agent workloads.

Why AI Agents Break Traditional Performance Tuning

Human analysts are creatures of habit. They have favorite reports, preferred dimensions, and consistent working hours. The Monday morning sales report, the end-of-quarter pipeline summary, the weekly retention cohort: these recur reliably. Data engineers learn these patterns and optimize for them: they build and schedule the right materializations, index the right columns, partition tables by the dimensions that matter.

That process takes time. A data engineer notices a slow query, identifies the access pattern, designs a materialization, tests it, and deploys it. In a mature team, this cycle takes days. In a less mature one, it takes weeks. But the tradeoff is acceptable because query patterns change slowly. The Monday report runs on Monday for months.

AI agents shatter this assumption. A single AI agent connected to your data via the Model Context Protocol (MCP) can explore your data in completely novel ways during a single conversation. It might scan a table it has never touched. It might join three tables that have never been joined together in your history. It might aggregate on a dimension that no human analyst ever used. And it might do this at 2am when no data engineer is on call.

When query patterns shift this fast, the traditional "observe, identify, build, deploy" cycle is too slow. By the time you've built the materialization, the AI agent has already moved on to a different line of analysis.

What you need is a system that observes query patterns continuously, identifies opportunities for acceleration automatically, builds the right materializations in the background, and deploys them transparently: all without waiting for a human to notice there's a problem.

That's exactly what Dremio's autonomous performance architecture does.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Layer 1: Keeping the Storage Layer Clean

Query performance starts at the storage layer. Before any caching or query acceleration matters, the underlying data files need to be in good shape. And with Apache Iceberg on object storage, maintaining file health is a real operational concern.

The Small-File Problem

Every write to an Iceberg table creates one or more new data files. Frequent small writes: from streaming ingestion pipelines, from AI agent-generated updates, from CDC (change data capture) feeds: produce large numbers of tiny Parquet files. A table with 50,000 two-kilobyte files is not the same as a table with 50 two-megabyte files, even if they contain identical data. The former requires 50,000 separate metadata lookups and I/O operations at query time. The latter requires 50. This difference compounds dramatically at scale.

Iceberg's row-level update and delete semantics create a related problem. When an UPDATE or DELETE is executed, Iceberg doesn't rewrite the affected files immediately. Instead, it writes position delete files or deletion vectors that mark which rows are logically removed. As these accumulate, every read must apply them to reconstruct the correct state. A table with thousands of pending delete records on top of large data files is measurably slower than one where those deletes have been resolved.

Automatic Table Optimization

For tables managed through Dremio's Open Catalog, automatic optimization handles all of this on a dedicated engine so the maintenance work doesn't compete with your query workloads.

Compaction is the core operation. It merges many small files into optimally-sized files and splits oversized files that have grown too large to query efficiently. Beyond compaction, automatic optimization handles delete file resolution: it rewrites data files to incorporate pending position deletes and deletion vectors so subsequent reads are clean. Clustering physically reorders data within files based on frequently-filtered columns, so queries that filter on those columns can skip large blocks without reading them. Manifest rewriting keeps Iceberg's metadata tree balanced: too many manifest files slows query planning; too few makes each manifest too large to scan quickly.

Partition evolution alignment is another operation worth understanding. When a table's partitioning strategy changes: say, from partitioning by month to partitioning by day: historical files remain organized under the old strategy. Automatic optimization can rewrite those files to align with the current spec, eliminating the performance penalty of mixed partition layouts.

Automatic Vacuuming

Separately from compaction, Dremio's automatic vacuuming handles snapshot lifecycle management. Iceberg keeps every historical snapshot by default. This is what enables time travel queries. But unbounded snapshot accumulation has costs: query planning must scan the snapshot history to find the current state, and the metadata files for expired snapshots continue to occupy storage.

Automatic vacuuming expires snapshots beyond the configured retention window and removes orphaned data files that are no longer referenced by any active snapshot. This keeps the metadata footprint manageable and prevents query planning time from growing proportionally with table age.

Both automatic optimization and automatic vacuuming are enabled through the Open Catalog source settings, with the option to configure behavior at individual table level when specific tables need different treatment.

Manual Controls for Tables Outside Open Catalog

For Iceberg tables not managed through Dremio's Open Catalog, or for situations requiring on-demand maintenance, SQL commands provide manual control:

  • OPTIMIZE TABLE <table_name> triggers the same compaction and maintenance operations as the automatic system.
  • VACUUM TABLE <table_name> EXPIRE SNAPSHOTS ... manually expires snapshots and removes orphaned files.

These commands are useful for one-time cleanups after large batch operations, or for tables on custom maintenance schedules.

Layer 2: Autonomous Reflections: The Self-Managing Acceleration Layer

Clean storage is the foundation. Autonomous Reflections is what makes queries fast.

A Reflection is a pre-computed, physically optimized copy of a dataset, stored as Apache Iceberg tables in your data lake. When a query arrives, Dremio's query optimizer checks whether any existing Reflection can satisfy the query: or a portion of it: more efficiently than scanning the raw source data. If it can, the query plan is automatically rewritten to use the Reflection. The user's SQL doesn't change. The user doesn't know a Reflection was used. They just see faster results.

There are three types of Reflections:

Raw Reflections create reorganized, sorted, and partitioned copies of source data. They accelerate queries that filter, join, and aggregate on specific columns by reducing the scan footprint and improving data co-location.

Aggregation Reflections pre-compute metric summaries over defined dimensions. A query asking for total revenue by region by quarter, for example, can be answered from a pre-computed Aggregation Reflection in milliseconds rather than by scanning and aggregating millions of order records at query time.

External Reflections let you register pre-existing materializations created outside Dremio: for instance, a Spark-generated aggregate table: as Reflections that Dremio's optimizer can use in query planning.

How Autonomous Reflections Work

The critical feature for agentic workloads is that Dremio creates and manages Reflections automatically, based on observed query patterns.

Dremio continuously collects metadata from every query that runs: which datasets were accessed, which columns were selected, which filters were applied, which aggregations were computed, and how long each query took. An autonomous analysis algorithm runs daily: typically at midnight UTC: against the previous seven days of query history.

The algorithm identifies recurring, expensive patterns: queries that touch large datasets with consistent filter or aggregation patterns that a Reflection could accelerate. It determines whether a Raw or Aggregation Reflection is the right type, and automatically creates it in the background. When the Reflection is ready, the query optimizer begins routing matching queries to it.

Reflections that are created autonomously but stop receiving traffic: because query patterns shifted: are automatically dropped after a configurable grace period. The system doesn't accumulate stale materializations that waste storage and maintenance resources.

Live Reflections for Iceberg Tables

For Iceberg tables, Dremio supports incremental Reflection refresh. When the source Iceberg table changes, Dremio processes only the new, modified, or deleted records since the last Reflection update, rather than rebuilding the entire Reflection from scratch. This keeps Reflections current without the expense of full rebuilds, which matters for tables that change frequently.

If a Reflection hasn't been refreshed to reflect the most recent data, Dremio automatically falls back to querying the raw source. This fallback is automatic and guaranteed: a stale Reflection will never return wrong results.

Isolating Refresh Work from Query Traffic

Reflection refresh jobs are resource-intensive background operations. Left unconfigured, they compete with interactive query traffic for compute resources, which can cause query latency spikes during refresh windows.

Dremio lets administrators configure a dedicated engine pool specifically for Reflection refresh work. This isolates background materialization from the engines that serve user queries and AI agent requests, ensuring consistent interactive performance regardless of how many Reflections are being refreshed in the background.

Why This Loop Matters for AI Agents

An AI agent running an analysis at 2am explores a novel access pattern on your orders table: filtering on a new dimension combination and aggregating a metric that no human analyst has used before. The autonomous analysis algorithm catches this pattern during its next daily run. By the following morning, a Reflection exists that covers that access pattern. The AI agent's next exploration along the same dimension returns in under a second.

No human was involved. No ticket was filed. No performance review was held. The system adapted.

Manual Reflections for Expert Control

Autonomous Reflections are the default operating mode, but they don't prevent experts from intervening. When you know exactly what access pattern to optimize for: a specific BI report, a known AI agent workflow, a regularly-scheduled batch job: you can define a manual Reflection with precise control over which columns are included, how data is sorted, and what dimensions are pre-aggregated.

Manual Reflections coexist with Autonomous Reflections. The optimizer considers all available Reflections, manual or autonomous, when planning each query.

Layer 3: Caching: The Fast Path for Repeated Queries

Reflections handle structural query acceleration. Caching handles the data-access layer.

Columnar Cloud Cache (C3)

Cloud object storage: S3, ADLS, GCS: is cheap and durable, but it is not fast. Every read is a network call with latency measured in tens to hundreds of milliseconds. For interactive analytics and AI agent workflows that issue many queries in quick succession, this latency accumulates.

The Columnar Cloud Cache (C3) solves this by maintaining a local NVMe/SSD cache on each of Dremio's executor nodes. The first time a query reads a Parquet file or Iceberg data file from cloud storage, Dremio fetches it over the network and simultaneously writes it to the local cache. Subsequent queries that access the same data serve it from local disk, reducing retrieval time from network latency to local-disk speed.

C3 operates completely automatically. You don't configure which files to cache or when to evict them. It tracks access patterns and manages the cache using a standard eviction policy that keeps frequently-accessed data hot.

C3 and Reflections work at different levels of the stack, and they stack directly. A Reflection that Dremio creates for an AI agent's access pattern is stored in the data lake. As that Reflection gets queried, C3 caches the Reflection's underlying data files on local NVMe. Subsequent queries on the same pattern benefit from both the structural optimization of the Reflection and the I/O speed of C3.

There's also a cost dimension. Cloud object storage pricing includes API call charges and egress fees. Every read that C3 serves from local cache is an API call and network transfer that didn't happen. For teams running intensive AI agent workloads that generate high read volumes, C3 can produce meaningful reductions in cloud storage costs.

Results Cache

For queries that are run multiple times with identical SQL: exact same text, exact same parameters: Dremio caches the result set and returns it instantly on subsequent runs without executing any compute.

This matters for AI agents specifically because multi-step analyses often include repeated verification queries. An agent answering a complex business question might run the same lookup query five times during a chain of reasoning steps. The Results Cache ensures those repeated calls don't consume engine resources.

Query Plan Cache

Parsing and planning SQL has overhead. For complex queries with many joins, subqueries, and window functions, the planning phase itself can take tens of milliseconds. Dremio caches compiled query execution plans. Queries that are run repeatedly bypass the planning phase entirely and proceed directly to execution.

Layer 4: Engine Architecture: Performance That's Always On

The layers above handle storage, acceleration, and caching. The query engine itself provides the remaining performance foundation through architectural choices that don't require configuration.

Apache Arrow and the Serialization Tax

Dremio co-created Apache Arrow, the open-source columnar in-memory data format. Arrow is not just an integration format in Dremio: it is the native in-memory format that every operation runs on.

This has a concrete performance consequence: data doesn't get converted between formats during query execution. Traditional query engines that use row-oriented internal representations must serialize and deserialize data as it moves through the processing pipeline. Each conversion cycle consumes CPU cycles. At scale, this "serialization tax" is measurable.

Because Dremio's entire processing pipeline works natively in Arrow's columnar format, that tax doesn't exist. Data moves from Parquet files (columnar on disk) into Arrow (columnar in memory) without a format conversion step. The CPU processes column batches using SIMD (Single Instruction Multiple Data) instructions, which apply the same operation to multiple values simultaneously. For aggregations, filters, and joins over large column vectors, this is substantially faster than scalar row-by-row processing.

Arrow's columnar memory layout also improves CPU cache utilization. When the engine scans a column, all values for that column are stored contiguously in memory. The CPU cache loads a contiguous block of values, and the column scan proceeds with high cache hit rates. Row-oriented layouts scatter column values across memory, causing frequent cache misses during column-focused analytical queries.

Automatic Predicate Pushdown

When your SQL includes filter predicates, Dremio's query planner automatically moves those filters as close to the data as possible.

For Iceberg tables, this happens at multiple levels. Iceberg metadata contains min/max statistics for each column in each data file. Before opening any data file, the query planner reads these statistics and identifies which files could possibly contain rows matching the filter. Files whose range doesn't overlap the filter value are skipped entirely, without being read. Partition metadata provides a coarser-grained first pass: entire partitions that can't match are skipped before file-level statistics are checked.

For external database sources, Dremio's Advanced Relational Pushdown (ARP) framework pushes not just simple filters but complex computations into the source database's SQL engine. Instead of pulling a full table from PostgreSQL and filtering in Dremio, ARP rewrites the query so the database returns only the rows that survive the filter, joins, and aggregations. Data volume pulled across the network drops dramatically.

None of this requires user configuration. The query planner applies these optimizations automatically for every query.

Massively Parallel Execution

Every query Dremio runs is automatically parallelized across its executor nodes. The query planner decomposes a SQL statement into execution fragments, assigns each fragment to the executor node that has the relevant data cached locally (maximizing C3 cache utilization), and coordinates the parallel execution. Results are merged and returned to the client.

Users write standard, single-threaded SQL. The MPP execution is invisible and automatic.

Autonomous vs. User-Configured: A Quick Reference

Dremio's performance features split cleanly into two categories. The autonomous features form the always-on baseline. The user-configured features give experts precise control for known patterns.

FeatureAutonomous or User-Configured
Automatic file compactionAutonomous
Delete file resolutionAutonomous
Manifest rewritingAutonomous
Automatic vacuumingAutonomous
Columnar Cloud Cache (C3)Autonomous
Results CacheAutonomous
Query Plan CacheAutonomous
Autonomous ReflectionsAutonomous
Live Reflection incremental refreshAutonomous
Predicate pushdownAutonomous
Vectorized Arrow executionAutonomous (always-on)
Massively parallel executionAutonomous (always-on)
Manual Reflections (Raw, Aggregation, External)User-configured
OPTIMIZE TABLE (on-demand)User-configured
VACUUM TABLE (on-demand)User-configured
Clustering key definitionUser-configured
Partition specificationUser-configured
Dedicated refresh engine assignmentUser-configured

The autonomous features don't require your team to know what query patterns are coming. The user-configured features let you add precision when you do know.

Conclusion

The challenge of performance in an agentic analytics environment isn't that you have too little control over your query engine. It's that you can't use control you don't have time to exercise. AI agents generate novel queries faster than any human performance review cycle can respond to.

Dremio's layered autonomous performance architecture is designed for exactly this reality. The storage layer stays clean automatically. Reflections form and adapt based on actual query patterns, not forecasts. Caching keeps hot data close to compute. The query engine's Arrow-native architecture and automatic pushdowns squeeze performance from every query without configuration.

The result is a lakehouse that gets faster as it's used: through compaction that keeps file layouts optimal, through Reflections that accumulate around the access patterns your agents actually exhibit, through C3 that warms up around your hottest data, all without a single performance ticket or tuning sprint.

If you're building agentic analytics on a data lakehouse and you want performance management that keeps pace with AI workloads, start with a free trial of Dremio Cloud at dremio.com/get-started.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.