Dremio Blog

22 minute read · March 6, 2026

Reduce Databricks Compute Costs by 40–60% with Dremio’s Agentic Lakehouse

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
Reduce Databricks Compute Costs by 40–60% with Dremio’s Agentic Lakehouse
Copied to clipboard

Key Takeaways

  • Databricks incurs dual billing, charging for both Databricks Units (DBUs) and cloud infrastructure, leading to high costs for BI queries.
  • Dremio reduces Databricks costs by offloading BI queries and using a query engine that eliminates 60-80% of SQL Warehouse DBU consumption.
  • Using Dremio's Autonomous Reflections, users can execute queries without incurring Databricks DBU charges, resulting in significant savings.
  • Dremio offers AI functions natively in SQL, avoiding additional costs associated with deploying separate ML models on Databricks.
  • Connect Databricks to Dremio in phases to maximize savings and reduce dependency on Databricks for analytics.

Databricks bills you twice for every workload: Databricks Units for compute, plus the cloud VM costs underneath. Interactive compute runs $0.40/DBU. SQL Warehouses run $0.22/DBU. And the cloud infrastructure bill on top can match or exceed the Databricks charges. Most teams don't realize how much of that spend is coming from a single category: analysts running ad-hoc queries and dashboards refreshing against SQL Warehouses, workloads that don't need Spark and don't need to cost $0.22-0.40 per DBU.

Dremio keeps Databricks in place for what it does well: ETL, ML training, and Spark-based processing. But it offloads BI queries, dashboard refreshes, and analyst exploration to a purpose-built query engine with Autonomous Reflections, eliminating 60-80% of SQL Warehouse DBU consumption and the associated cloud compute. Analysts move from Databricks notebooks to Dremio's AI Agent. Dashboards stop generating DBUs entirely.

This guide breaks down where Databricks costs come from, how Dremio reduces them, and the AI features that go beyond cost savings.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Databricks Cost Problem

Dual Billing: You Pay Twice

Every Databricks workload creates two bills:

  1. Databricks bill: Measured in DBUs. Interactive (All-Purpose) compute costs $0.40/DBU. SQL Warehouses cost $0.22/DBU. Jobs compute costs $0.15/DBU.
  2. Cloud provider bill: The VMs, disks, and networking that Databricks runs on. These costs are separate and can be substantial.

A typical production analytics setup with a Medium SQL Warehouse running 10 hours/day consumes approximately 400-600 DBUs/day. At $0.22/DBU, that is $88-132/day in Databricks charges alone. Add the cloud VM costs (often $40-80/day for the underlying instances) and you are at $130-210/day or $2,800-4,500/month for a single SQL Warehouse.

Where the Money Actually Goes

Cost ComponentRateImpact
All-Purpose (Interactive) Compute$0.40/DBUMost expensive tier. Dev and exploration work burns through budget fast
SQL Warehouses$0.22/DBUEvery dashboard refresh, every ad-hoc query consumes DBUs
Cloud VM costsVaries by instanceEqual to or greater than DBU costs for compute-intensive workloads
Idle clustersFull DBU rate until auto-terminateDefault 120-minute auto-terminate means paying for up to 2 hours of idle time
Photon acceleration2x DBU rateDoubles the DBU consumption for Photon-enabled clusters
Delta Live TablesHigher DBU rateContinuous processing pipelines consume DBUs around the clock
Unity CatalogDBU charges for governance operationsMetadata operations add to the bill
Storage (cloud provider)$0.023/GB-month (S3)Separate from Databricks, billed by cloud provider

The Interactive Query Tax

The highest cost per DBU is interactive compute ($0.40/DBU). Development, exploration, and notebook-based analysis all hit this tier. When analysts use Databricks notebooks or SQL editor for ad-hoc analysis, they are consuming the most expensive compute available.

SQL Warehouses are cheaper per DBU ($0.22), but they still run on cloud VMs that bill separately. And like Redshift Serverless, there is startup latency and minimum billing when a SQL Warehouse spins up from idle.

How Dremio Cuts the Databricks Bill

Dremio connects to your Databricks data through two approaches:

Dremio supports Unity Catalog as a lakehouse catalog source. Through Unity Catalog, Dremio can read any Iceberg tables or UniForm-enabled Delta tables in your Databricks environment. No Databricks SQL Warehouse compute is consumed because Dremio reads the data directly from your object storage through the catalog metadata. This is the recommended approach because Iceberg and UniForm tables qualify for Autonomous Reflections.

Approach 2: Direct Object Storage Connection

Since your data lives on S3, Azure Blob, or GCS, Dremio can also read it directly using its object storage connectors. This bypasses Databricks compute entirely and works with Iceberg tables, Parquet files, and other open formats on your storage.

Reflections: The Core Cost Reduction Mechanism

Once connected, Dremio creates Reflections, optimized Apache Iceberg copies of your data stored on your cloud object storage. These Reflections are built once and then automatically refreshed when source data changes.

When a user or BI tool queries through Dremio, the optimizer checks if a Reflection can satisfy the query. If yes, Dremio serves it locally. Zero Databricks DBUs consumed. Zero cloud VM costs for Databricks.

Cost Reduction Scenario

Before Dremio:

  • Medium SQL Warehouse running 10 hours/day, 22 days/month
  • Databricks DBU cost: ~$2,200/month
  • Cloud VM cost: ~$1,400/month
  • All-Purpose clusters for ad-hoc analysis: ~$1,500/month
  • Total: ~$5,100/month

After Dremio with Autonomous Reflections:

  • 75% of BI/dashboard queries served from Dremio Reflections
  • SQL Warehouse reduced to 3 hours/day (ETL + ML only)
  • Databricks DBU cost: ~$660/month
  • Cloud VM cost: ~$420/month
  • All-Purpose clusters eliminated (Dremio Used Instead): $0
  • Dremio Cloud cost (DCUs + cloud infrastructure): ~$1,000-1,500/month
  • Total: ~$2,080-2,580/month (50-60% savings)

Note: Dremio Cloud pricing includes both Dremio Compute Units (DCUs) and the underlying cloud infrastructure costs. Even accounting for both, the total is significantly less than the Databricks spend it replaces because Dremio's engine is optimized for analytical query serving rather than general-purpose data processing.

The largest savings come from eliminating interactive compute ($0.40/DBU) entirely. Analysts use Dremio's AI Agent instead of Databricks notebooks for exploration and ad-hoc queries.

Step-by-Step: Connect Databricks to Dremio

Step 1: Start Your Free Dremio Cloud Trial

Sign up at Dremio Cloud for a 30-day trial with full access to Autonomous Reflections, AI Agent, and all AI features.

Step 2: Connect to Your Data

Option A: Unity Catalog (Recommended)

Add a Unity Catalog source in Dremio. This connects directly to your Databricks Unity Catalog via Unity’s Iceberg REST Catalog interface. No Databricks SQL Warehouse compute is consumed because Dremio reads from your object storage through the catalog metadata.

Configuration Details

This is the recommended approach because Iceberg and UniForm tables qualify for Autonomous Reflections.

Option B: Direct Object Storage Connection

If your Iceberg or Parquet data lives on S3, Azure Blob, or GCS, add an object storage source in Dremio. Point it to the bucket where your data lives. Dremio reads the files directly, bypassing Databricks compute entirely. (Can support Apache Iceberg tables, Parquet tables and Delta Lake tables written with V2 writers)

Step 3: Build Virtual Datasets

Create semantic layer views that combine business logic:

CREATE VDS analytics.customer_metrics AS

SELECT

  c.customer_segment,

  DATE_TRUNC('quarter', o.order_date) AS quarter,

  COUNT(DISTINCT o.order_id) AS total_orders,

  SUM(o.order_total) AS revenue,

  AVG(o.order_total) AS avg_order_value,

  COUNT(DISTINCT CASE WHEN o.is_repeat = true THEN o.customer_id END) AS repeat_buyers

FROM unity_catalog.analytics_db.orders o

JOIN unity_catalog.analytics_db.customers c ON o.customer_id = c.id

GROUP BY 1, 2

Step 4: Enable Reflections

Navigate to the virtual dataset → Reflections tab:

  1. Enable Raw Reflections for the full dataset with optimized partitioning. Best for queries that scan or filter rows.
  2. Enable Aggregate Reflections for pre-computed metrics. Best for dashboard queries and summary reports.

Which type to create depends on how the data is being used. Match the Reflection type to your dominant query patterns, or create both if usage is mixed.

Dremio builds Iceberg-based Reflections on your storage. Subsequent queries hit the Reflection, not Databricks.

Step 5: Activate Autonomous Reflections

Go to Project SettingsReflections → Enable Autonomous Reflections.

Dremio monitors query patterns over 7 days and automatically creates Reflections for your most common queries. It also drops Reflections that are no longer used, keeping storage costs minimal.

Key detail: Autonomous Reflections only consider datasets natively stored as Iceberg (or UniForm tables, Parquet datasets, and views built on them). While all Reflections are internally stored as Iceberg tables, those internal tables are not exposed from Dremio's catalog. Only your natively-stored Iceberg data is eligible for autonomous optimization. This gives you a strong reason to convert Delta tables to Iceberg (or use Databricks UniForm to expose Iceberg-compatible metadata): it unlocks Dremio's autonomous performance management.

Dremio is an Iceberg-native lakehouse platform with a built-in Open Catalog based on Apache Polaris. Because the Open Catalog uses the standard Iceberg REST protocol, your Databricks Spark jobs and other engines can still read and write to the same Iceberg tables that Dremio is autonomously accelerating. Dremio manages query performance without blocking other engines or creating additional lock-in.

The maintenance engine auto-suspends after 30 seconds of idle time.

The AI Advantage: What Dremio Does That Databricks SQL Cannot

AI SQL Functions

Dremio embeds LLM capabilities directly in SQL with three functions:

-- Classify support tickets by urgency using AI

SELECT

  ticket_id,

  subject,

  AI_CLASSIFY(

    CONCAT(subject, ': ', description),

    ARRAY['critical', 'high', 'medium', 'low']

  ) AS urgency_level

FROM analytics.support_tickets
FunctionPurposeDatabricks Equivalent
AI_CLASSIFY()Categorize text with LLMRequires MLflow model + UDF registration
AI_GENERATE()Extract structured data from unstructured sourcesRequires custom Spark job or Foundation Model API
AI_COMPLETE()Summarize data into narrative insightsRequires Databricks AI Functions (limited to Foundation Models)

The key difference: Dremio's AI functions are native SQL. No separate ML pipeline, no model deployment, no additional DBU charges for ML compute.

The AI Agent vs. Databricks Assistant

Both platforms offer AI assistants, but they work differently:

Databricks Assistant:

  • Limited to notebook/SQL editor context
  • Generates SQL and explains code
  • Runs on Databricks compute (incurs DBU charges)

Dremio AI Agent:

  • Full analytical workflow: discover, analyze, visualize, optimize
  • Uses the semantic layer (views, wikis, labels) for business context
  • Runs against Reflections, not the source data
  • Generates charts and suggests follow-up questions
  • Zero Databricks compute consumed

For analytical workflows, Dremio's Agent replaces the need for analysts to use Databricks notebooks for ad-hoc analysis. Since the Agent runs on Dremio's engine using Reflections, the queries never touch Databricks.

MCP Server and Self-Documenting Catalog

Dremio's MCP Server connects external AI tools (ChatGPT, Claude, Cursor, custom agents) directly to Dremio. Every AI-generated query runs against Reflections, not Databricks. Your AI tools get data access without generating Databricks costs.

Dremio also auto-generates documentation for your datasets using AI. It samples schema and data to create wikis (descriptions) and suggest labels (tags), building a self-documenting catalog that would take weeks to create manually.

Architecture: How Dremio and Databricks Complement Each Other

┌─────────────────────────────────────────────────────┐

│       BI Tools / AI Agents / Analysts               │

│    (Power BI, Tableau, Looker, ChatGPT, Claude)     │

└──────────────────────┬──────────────────────────────┘

                       │ All analytics + BI queries

                       ▼

┌─────────────────────────────────────────────────────┐

│                   Dremio Cloud                       │

│                                                      │

│  ┌─────────────┐  ┌───────────────┐  ┌────────────┐ │

│  │ Semantic     │  │  Autonomous   │  │ AI Agent + │ │

│  │ Layer        │  │  Reflections  │  │ AI SQL     │ │

│  │ (replaces    │  │  (replaces    │  │ Functions  │ │

│  │ notebook     │  │  SQL          │  │ (replaces  │ │

│  │ ad-hoc)      │  │  warehouse    │  │ ML         │ │

│  │              │  │  for BI)      │  │ pipelines) │ │

│  └─────────────┘  └──────┬────────┘  └────────────┘ │

│                          │                           │

│  ┌───────────────────────▼─────────────────────────┐ │

│  │ Apache Iceberg on Your Object Storage           │ │

│  │ (Reflections + managed Iceberg tables)          │ │

│  └─────────────────────────────────────────────────┘ │

└─────────────────────────────────────────────────────┘

         Databricks handles ONLY:

┌─────────────────────────────────────────────────────┐

│              Databricks Workspace                    │

│                                                      │

│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │

│  │ ETL / ELT    │  │ ML Training  │  │ Streaming  │ │

│  │ (Jobs        │  │ (MLflow,     │  │ (Structured│ │

│  │ Compute at   │  │ GPU          │  │ Streaming, │ │

│  │ $0.15/DBU)   │  │ clusters)    │  │ Delta Live │ │

│  │              │  │              │  │ Tables)    │ │

│  └──────────────┘  └──────────────┘  └────────────┘ │

└─────────────────────────────────────────────────────┘

The principle: Run analytics on Dremio (cheap), run processing on Databricks (powerful). Stop paying $0.22-0.40/DBU for queries that Dremio can serve at a fraction of the cost.

The Vendor Lock-In Advantage

Databricks has several proprietary features that create platform dependency:

  • Delta Live Tables: Proprietary pipeline orchestration
  • Unity Catalog: Tightly integrated governance layer
  • Autoloader: Proprietary ingestion framework
  • Magic commands and dbutils: Platform-specific APIs

Dremio is built entirely on open standards:

  • Apache Iceberg for data storage (no proprietary format)
  • Apache Arrow for in-memory processing (zero serialization tax)
  • Apache Polaris for catalog management (open Iceberg REST catalog)
  • Model Context Protocol (MCP) for AI integration (open standard)

Dremio's built-in Open Catalog uses the standard Iceberg REST protocol. This means your Databricks Spark jobs, Flink pipelines, and Trino queries can all read and write to the same Iceberg tables that Dremio is autonomously managing and accelerating. Autonomous Reflections only consider datasets natively stored as Iceberg, which is why Dremio's Iceberg-native architecture is the key enabler: it gives you autonomous performance management while preserving full multi-engine interoperability.

If you decide to move away from any platform in the future, your data stays accessible in an open format.

Migration Path: Incremental, Not All-or-Nothing

PhaseActionSavings
Phase 1Connect Databricks data + enable Reflections40-60% BI/analytics cost reduction
Phase 2Move analysts from Databricks notebooks to Dremio AI AgentEliminate interactive compute ($0.40/DBU)
Phase 3Convert Delta tables to Iceberg for multi-engine accessReduce Databricks lock-in, optional engine flexibility
Phase 4Downsize SQL Warehouses to handle only ETL residualsFurther 20-30% reduction on remaining Databricks spend

You keep Databricks for what it does well (Spark workloads, ML, streaming) and move analytics queries to Dremio for what it does better (BI acceleration, AI-native analytics, federated queries).

Side-by-Side Comparison

CapabilityDatabricksDremio
Query accelerationManual materialized views or Delta cachingAutonomous Reflections (self-tuning, auto-managed)
AI in SQLFoundation Model APIs (separate compute)Native AI_CLASSIFY, AI_GENERATE, AI_COMPLETE
AI agentDatabricks Assistant (notebook/SQL context)Full analytical co-pilot with visualization and follow-ups
FederationJDBC/federated queries (limited sources)Native federation across 20+ source types
Data formatDelta Lake (Databricks-maintained)Apache Iceberg (multi-vendor standard)
Pricing modelDBU + cloud VM (dual billing)DCU + cloud infrastructure (consumption-based)
CatalogUnity Catalog (proprietary)Open Catalog built on Apache Polaris (open standard)
Self-documentingManual taggingAI-generated wikis and labels

Get Started

  1. Start your free Dremio Cloud trial: https://www.dremio.com/get-started
  2. Connect via Unity Catalog to read Iceberg and UniForm-enabled Delta tables without Databricks compute
  3. Enable Reflections on your busiest dashboard datasets
  4. Monitor your Databricks DBU consumption and cloud provider compute costs

Most teams see measurable savings within the first week. The analyst productivity gains from Dremio's AI Agent compound the financial savings by reducing the need for expensive interactive compute sessions.

For deep-dive documentation, visit Dremio Docs or take free courses at Dremio University.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.