Dremio Blog

22 minute read · March 6, 2026

Keep Databricks for ETL. Cut Analytics Costs.

Alex Merced Head of DevRel, Dremio

Start For Free

Copied to clipboard

Keep Databricks for ETL. Cut Analytics Costs.

Key Takeaways

The Databricks Cost Problem

How Dremio Cuts the Databricks Bill

Step-by-Step: Connect Databricks to Dremio

The AI Advantage: What Dremio Does That Databricks SQL Cannot

Architecture: How Dremio and Databricks Complement Each Other

The Vendor Lock-In Advantage

Migration Path: Incremental, Not All-or-Nothing

Side-by-Side Comparison

Get Started

Key Takeaways

Databricks incurs dual billing, charging for both Databricks Units (DBUs) and cloud infrastructure, leading to high costs for BI queries.
Dremio reduces Databricks costs by offloading BI queries and using a query engine that eliminates 60-80% of SQL Warehouse DBU consumption.
Using Dremio's Autonomous Reflections, users can execute queries without incurring Databricks DBU charges, resulting in significant savings.
Dremio offers AI functions natively in SQL, avoiding additional costs associated with deploying separate ML models on Databricks.
Connect Databricks to Dremio in phases to maximize savings and reduce dependency on Databricks for analytics.

Databricks bills you twice for every workload: Databricks Units for compute, plus the cloud VM costs underneath. Interactive compute runs $0.40/DBU. SQL Warehouses run $0.22/DBU. And the cloud infrastructure bill on top can match or exceed the Databricks charges. Most teams don't realize how much of that spend is coming from a single category: analysts running ad-hoc queries and dashboards refreshing against SQL Warehouses, workloads that don't need Spark and don't need to cost $0.22-0.40 per DBU.

Dremio keeps Databricks in place for what it does well: ETL, ML training, and Spark-based processing. But it offloads BI queries, dashboard refreshes, and analyst exploration to a purpose-built query engine with Autonomous Reflections, eliminating 60-80% of SQL Warehouse DBU consumption and the associated cloud compute. Analysts move from Databricks notebooks to Dremio's AI Agent. Dashboards stop generating DBUs entirely.

This guide breaks down where Databricks costs come from, how Dremio reduces them, and the AI features that go beyond cost savings.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Databricks Cost Problem

Dual Billing: You Pay Twice

Every Databricks workload creates two bills:

Databricks bill: Measured in DBUs. Interactive (All-Purpose) compute costs $0.40/DBU. SQL Warehouses cost $0.22/DBU. Jobs compute costs $0.15/DBU.
Cloud provider bill: The VMs, disks, and networking that Databricks runs on. These costs are separate and can be substantial.

A typical production analytics setup with a Medium SQL Warehouse running 10 hours/day consumes approximately 400-600 DBUs/day. At $0.22/DBU, that is $88-132/day in Databricks charges alone. Add the cloud VM costs (often $40-80/day for the underlying instances) and you are at $130-210/day or $2,800-4,500/month for a single SQL Warehouse.

Where the Money Actually Goes

Cost Component	Rate	Impact
All-Purpose (Interactive) Compute	$0.40/DBU	Most expensive tier. Dev and exploration work burns through budget fast
SQL Warehouses	$0.22/DBU	Every dashboard refresh, every ad-hoc query consumes DBUs
Cloud VM costs	Varies by instance	Equal to or greater than DBU costs for compute-intensive workloads
Idle clusters	Full DBU rate until auto-terminate	Default 120-minute auto-terminate means paying for up to 2 hours of idle time
Photon acceleration	2x DBU rate	Doubles the DBU consumption for Photon-enabled clusters
Delta Live Tables	Higher DBU rate	Continuous processing pipelines consume DBUs around the clock
Unity Catalog	DBU charges for governance operations	Metadata operations add to the bill
Storage (cloud provider)	$0.023/GB-month (S3)	Separate from Databricks, billed by cloud provider

The Interactive Query Tax

The highest cost per DBU is interactive compute ($0.40/DBU). Development, exploration, and notebook-based analysis all hit this tier. When analysts use Databricks notebooks or SQL editor for ad-hoc analysis, they are consuming the most expensive compute available.

SQL Warehouses are cheaper per DBU ($0.22), but they still run on cloud VMs that bill separately. And like Redshift Serverless, there is startup latency and minimum billing when a SQL Warehouse spins up from idle.

How Dremio Cuts the Databricks Bill

Dremio connects to your Databricks data through two approaches:

Approach 1: Unity Catalog (Recommended for Iceberg/UniForm Data)

Dremio supports Unity Catalog as a lakehouse catalog source. Through Unity Catalog, Dremio can read any Iceberg tables or UniForm-enabled Delta tables in your Databricks environment. No Databricks SQL Warehouse compute is consumed because Dremio reads the data directly from your object storage through the catalog metadata. This is the recommended approach because Iceberg and UniForm tables qualify for Autonomous Reflections.

Approach 2: Direct Object Storage Connection

Since your data lives on S3, Azure Blob, or GCS, Dremio can also read it directly using its object storage connectors. This bypasses Databricks compute entirely and works with Iceberg tables, Parquet files, and other open formats on your storage.

Reflections: The Core Cost Reduction Mechanism

Once connected, Dremio creates Reflections, optimized Apache Iceberg copies of your data stored on your cloud object storage. These Reflections are built once and then automatically refreshed when source data changes.

When a user or BI tool queries through Dremio, the optimizer checks if a Reflection can satisfy the query. If yes, Dremio serves it locally. Zero Databricks DBUs consumed. Zero cloud VM costs for Databricks.

Cost Reduction Scenario

Before Dremio:

Medium SQL Warehouse running 10 hours/day, 22 days/month
Databricks DBU cost: ~$2,200/month
Cloud VM cost: ~$1,400/month
All-Purpose clusters for ad-hoc analysis: ~$1,500/month
Total: ~$5,100/month

After Dremio with Autonomous Reflections:

75% of BI/dashboard queries served from Dremio Reflections
SQL Warehouse reduced to 3 hours/day (ETL + ML only)
Databricks DBU cost: ~$660/month
Cloud VM cost: ~$420/month
All-Purpose clusters eliminated (Dremio Used Instead): $0
Dremio Cloud cost (DCUs + cloud infrastructure): ~$1,000-1,500/month
Total: ~$2,080-2,580/month (50-60% savings)

Note: Dremio Cloud pricing includes both Dremio Compute Units (DCUs) and the underlying cloud infrastructure costs. Even accounting for both, the total is significantly less than the Databricks spend it replaces because Dremio's engine is optimized for analytical query serving rather than general-purpose data processing.

The largest savings come from eliminating interactive compute ($0.40/DBU) entirely. Analysts use Dremio's AI Agent instead of Databricks notebooks for exploration and ad-hoc queries.

Step-by-Step: Connect Databricks to Dremio

Step 1: Start Your Free Dremio Cloud Trial

Step 2: Connect to Your Data

Option A: Unity Catalog (Recommended)

Add a Unity Catalog source in Dremio. This connects directly to your Databricks Unity Catalog via Unity’s Iceberg REST Catalog interface. No Databricks SQL Warehouse compute is consumed because Dremio reads from your object storage through the catalog metadata.

Configuration Details

This is the recommended approach because Iceberg and UniForm tables qualify for Autonomous Reflections.

Option B: Direct Object Storage Connection

If your Iceberg or Parquet data lives on S3, Azure Blob, or GCS, add an object storage source in Dremio. Point it to the bucket where your data lives. Dremio reads the files directly, bypassing Databricks compute entirely. (Can support Apache Iceberg tables, Parquet tables and Delta Lake tables written with V2 writers)

Step 3: Build Virtual Datasets

Create semantic layer views that combine business logic:

CREATE VDS analytics.customer_metrics AS

SELECT

  c.customer_segment,

  DATE_TRUNC('quarter', o.order_date) AS quarter,

  COUNT(DISTINCT o.order_id) AS total_orders,

  SUM(o.order_total) AS revenue,

  AVG(o.order_total) AS avg_order_value,

  COUNT(DISTINCT CASE WHEN o.is_repeat = true THEN o.customer_id END) AS repeat_buyers

FROM unity_catalog.analytics_db.orders o

JOIN unity_catalog.analytics_db.customers c ON o.customer_id = c.id

GROUP BY 1, 2

Step 4: Enable Reflections

Navigate to the virtual dataset → Reflections tab:

Enable Raw Reflections for the full dataset with optimized partitioning. Best for queries that scan or filter rows.
Enable Aggregate Reflections for pre-computed metrics. Best for dashboard queries and summary reports.

Which type to create depends on how the data is being used. Match the Reflection type to your dominant query patterns, or create both if usage is mixed.

Dremio builds Iceberg-based Reflections on your storage. Subsequent queries hit the Reflection, not Databricks.

Step 5: Activate Autonomous Reflections

Go to Project Settings → Reflections → Enable Autonomous Reflections.

Dremio monitors query patterns over 7 days and automatically creates Reflections for your most common queries. It also drops Reflections that are no longer used, keeping storage costs minimal.

Key detail: Autonomous Reflections only consider datasets natively stored as Iceberg (or UniForm tables, Parquet datasets, and views built on them). While all Reflections are internally stored as Iceberg tables, those internal tables are not exposed from Dremio's catalog. Only your natively-stored Iceberg data is eligible for autonomous optimization. This gives you a strong reason to convert Delta tables to Iceberg (or use Databricks UniForm to expose Iceberg-compatible metadata): it unlocks Dremio's autonomous performance management.

Dremio is an Iceberg-native lakehouse platform with a built-in Open Catalog based on Apache Polaris. Because the Open Catalog uses the standard Iceberg REST protocol, your Databricks Spark jobs and other engines can still read and write to the same Iceberg tables that Dremio is autonomously accelerating. Dremio manages query performance without blocking other engines or creating additional lock-in.

The maintenance engine auto-suspends after 30 seconds of idle time.

The AI Advantage: What Dremio Does That Databricks SQL Cannot

AI SQL Functions

Dremio embeds LLM capabilities directly in SQL with three functions:

-- Classify support tickets by urgency using AI

SELECT

  ticket_id,

  subject,

  AI_CLASSIFY(

    CONCAT(subject, ': ', description),

    ARRAY['critical', 'high', 'medium', 'low']

  ) AS urgency_level

FROM analytics.support_tickets

Function	Purpose	Databricks Equivalent
AI_CLASSIFY()	Categorize text with LLM	Requires MLflow model + UDF registration
AI_GENERATE()	Extract structured data from unstructured sources	Requires custom Spark job or Foundation Model API
AI_COMPLETE()	Summarize data into narrative insights	Requires Databricks AI Functions (limited to Foundation Models)

The key difference: Dremio's AI functions are native SQL. No separate ML pipeline, no model deployment, no additional DBU charges for ML compute.

The AI Agent vs. Databricks Assistant

Both platforms offer AI assistants, but they work differently:

Databricks Assistant:

Limited to notebook/SQL editor context
Generates SQL and explains code
Runs on Databricks compute (incurs DBU charges)

Dremio AI Agent:

Full analytical workflow: discover, analyze, visualize, optimize
Uses the semantic layer (views, wikis, labels) for business context
Runs against Reflections, not the source data
Generates charts and suggests follow-up questions
Zero Databricks compute consumed

For analytical workflows, Dremio's Agent replaces the need for analysts to use Databricks notebooks for ad-hoc analysis. Since the Agent runs on Dremio's engine using Reflections, the queries never touch Databricks.

MCP Server and Self-Documenting Catalog

Dremio's MCP Server connects external AI tools (ChatGPT, Claude, Cursor, custom agents) directly to Dremio. Every AI-generated query runs against Reflections, not Databricks. Your AI tools get data access without generating Databricks costs.

Dremio also auto-generates documentation for your datasets using AI. It samples schema and data to create wikis (descriptions) and suggest labels (tags), building a self-documenting catalog that would take weeks to create manually.

Architecture: How Dremio and Databricks Complement Each Other

┌─────────────────────────────────────────────────────┐

│       BI Tools / AI Agents / Analysts               │

│    (Power BI, Tableau, Looker, ChatGPT, Claude)     │

└──────────────────────┬──────────────────────────────┘

                       │ All analytics + BI queries

                       ▼

┌─────────────────────────────────────────────────────┐

│                   Dremio Cloud                       │

│                                                      │

│  ┌─────────────┐  ┌───────────────┐  ┌────────────┐ │

│  │ Semantic     │  │  Autonomous   │  │ AI Agent + │ │

│  │ Layer        │  │  Reflections  │  │ AI SQL     │ │

│  │ (replaces    │  │  (replaces    │  │ Functions  │ │

│  │ notebook     │  │  SQL          │  │ (replaces  │ │

│  │ ad-hoc)      │  │  warehouse    │  │ ML         │ │

│  │              │  │  for BI)      │  │ pipelines) │ │

│  └─────────────┘  └──────┬────────┘  └────────────┘ │

│                          │                           │

│  ┌───────────────────────▼─────────────────────────┐ │

│  │ Apache Iceberg on Your Object Storage           │ │

│  │ (Reflections + managed Iceberg tables)          │ │

│  └─────────────────────────────────────────────────┘ │

└─────────────────────────────────────────────────────┘

         Databricks handles ONLY:

┌─────────────────────────────────────────────────────┐

│              Databricks Workspace                    │

│                                                      │

│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │

│  │ ETL / ELT    │  │ ML Training  │  │ Streaming  │ │

│  │ (Jobs        │  │ (MLflow,     │  │ (Structured│ │

│  │ Compute at   │  │ GPU          │  │ Streaming, │ │

│  │ $0.15/DBU)   │  │ clusters)    │  │ Delta Live │ │

│  │              │  │              │  │ Tables)    │ │

│  └──────────────┘  └──────────────┘  └────────────┘ │

└─────────────────────────────────────────────────────┘

The principle: Run analytics on Dremio (cheap), run processing on Databricks (powerful). Stop paying $0.22-0.40/DBU for queries that Dremio can serve at a fraction of the cost.

The Vendor Lock-In Advantage

Databricks has several proprietary features that create platform dependency:

Delta Live Tables: Proprietary pipeline orchestration
Unity Catalog: Tightly integrated governance layer
Autoloader: Proprietary ingestion framework
Magic commands and dbutils: Platform-specific APIs

Dremio is built entirely on open standards:

Apache Iceberg for data storage (no proprietary format)
Apache Arrow for in-memory processing (zero serialization tax)
Apache Polaris for catalog management (open Iceberg REST catalog)
Model Context Protocol (MCP) for AI integration (open standard)

Dremio's built-in Open Catalog uses the standard Iceberg REST protocol. This means your Databricks Spark jobs, Flink pipelines, and Trino queries can all read and write to the same Iceberg tables that Dremio is autonomously managing and accelerating. Autonomous Reflections only consider datasets natively stored as Iceberg, which is why Dremio's Iceberg-native architecture is the key enabler: it gives you autonomous performance management while preserving full multi-engine interoperability.

If you decide to move away from any platform in the future, your data stays accessible in an open format.

Migration Path: Incremental, Not All-or-Nothing

Phase	Action	Savings
Phase 1	Connect Databricks data + enable Reflections	40-60% BI/analytics cost reduction
Phase 2	Move analysts from Databricks notebooks to Dremio AI Agent	Eliminate interactive compute ($0.40/DBU)
Phase 3	Convert Delta tables to Iceberg for multi-engine access	Reduce Databricks lock-in, optional engine flexibility
Phase 4	Downsize SQL Warehouses to handle only ETL residuals	Further 20-30% reduction on remaining Databricks spend

You keep Databricks for what it does well (Spark workloads, ML, streaming) and move analytics queries to Dremio for what it does better (BI acceleration, AI-native analytics, federated queries).

Side-by-Side Comparison

Capability	Databricks	Dremio
Query acceleration	Manual materialized views or Delta caching	Autonomous Reflections (self-tuning, auto-managed)
AI in SQL	Foundation Model APIs (separate compute)	Native AI_CLASSIFY, AI_GENERATE, AI_COMPLETE
AI agent	Databricks Assistant (notebook/SQL context)	Full analytical co-pilot with visualization and follow-ups
Federation	JDBC/federated queries (limited sources)	Native federation across 20+ source types
Data format	Delta Lake (Databricks-maintained)	Apache Iceberg (multi-vendor standard)
Pricing model	DBU + cloud VM (dual billing)	DCU + cloud infrastructure (consumption-based)
Catalog	Unity Catalog (proprietary)	Open Catalog built on Apache Polaris (open standard)
Self-documenting	Manual tagging	AI-generated wikis and labels

Get Started

Start your free Dremio Cloud trial: https://www.dremio.com/get-started
Connect via Unity Catalog to read Iceberg and UniForm-enabled Delta tables without Databricks compute
Enable Reflections on your busiest dashboard datasets
Monitor your Databricks DBU consumption and cloud provider compute costs

Most teams see measurable savings within the first week. The analyst productivity gains from Dremio's AI Agent compound the financial savings by reducing the need for expensive interactive compute sessions.

For deep-dive documentation, visit Dremio Docs or take free courses at Dremio University.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Product Insights from the Dremio Blog

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.

Alex Merced

Oct 12, 2023 Product Insights from the Dremio Blog

Table-Driven Access Policies Using Subqueries

This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.

Albert Vernon

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

Keep Databricks for ETL. Cut Analytics Costs.

Table of Contents

Key Takeaways

Try Dremio’s Interactive Demo

The Databricks Cost Problem

Dual Billing: You Pay Twice

Where the Money Actually Goes

The Interactive Query Tax

How Dremio Cuts the Databricks Bill

Approach 1: Unity Catalog (Recommended for Iceberg/UniForm Data)

Approach 2: Direct Object Storage Connection

Reflections: The Core Cost Reduction Mechanism

Cost Reduction Scenario

Step-by-Step: Connect Databricks to Dremio

Step 1: Start Your Free Dremio Cloud Trial

Step 2: Connect to Your Data

Step 3: Build Virtual Datasets

Step 4: Enable Reflections

Step 5: Activate Autonomous Reflections

The AI Advantage: What Dremio Does That Databricks SQL Cannot

AI SQL Functions

The AI Agent vs. Databricks Assistant

MCP Server and Self-Documenting Catalog

Architecture: How Dremio and Databricks Complement Each Other

The Vendor Lock-In Advantage

Migration Path: Incremental, Not All-or-Nothing

Side-by-Side Comparison

Get Started

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

Key Takeaways

Try Dremio’s Interactive Demo

The Databricks Cost Problem

Dual Billing: You Pay Twice

Where the Money Actually Goes

The Interactive Query Tax

How Dremio Cuts the Databricks Bill

Approach 1: Unity Catalog (Recommended for Iceberg/UniForm Data)

Approach 2: Direct Object Storage Connection

Reflections: The Core Cost Reduction Mechanism

Cost Reduction Scenario

Step-by-Step: Connect Databricks to Dremio

Step 1: Start Your Free Dremio Cloud Trial

Step 2: Connect to Your Data

Step 3: Build Virtual Datasets

Step 4: Enable Reflections

Step 5: Activate Autonomous Reflections

The AI Advantage: What Dremio Does That Databricks SQL Cannot

AI SQL Functions

The AI Agent vs. Databricks Assistant

MCP Server and Self-Documenting Catalog

Architecture: How Dremio and Databricks Complement Each Other

The Vendor Lock-In Advantage

Migration Path: Incremental, Not All-or-Nothing

Side-by-Side Comparison

Get Started

Try Dremio Cloud free for 30 days

Related Dremio Articles

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Table-Driven Access Policies Using Subqueries

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Ready to Get Started?