Dremio Blog

18 minute read · March 6, 2026

How to Reduce Amazon Redshift Costs by 40-60% with Dremio’s Agentic Lakehouse

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
How to Reduce Amazon Redshift Costs by 40-60% with Dremio’s Agentic Lakehouse
Copied to clipboard

Amazon Redshift pricing looks straightforward on paper: provisioned clusters with reserved instances or Serverless with per-second RPU billing. In practice, the bill gets complicated fast. Provisioned clusters charge even when idle if you forget to pause them. Serverless bills a 60-second minimum per query, which inflates costs for the short dashboard queries that make up most BI workloads. Concurrency Scaling adds on-demand charges when parallel queries exceed your cluster's capacity. And Redshift Spectrum charges $5 per TB scanned when querying S3 data.

Dremio provides an alternative: keep Redshift for the workloads that need it, but offload the repetitive, expensive dashboard and reporting queries to Dremio's engine. Dremio's Autonomous Reflections serve those queries from Apache Iceberg tables on your own S3 storage, bypassing Redshift compute entirely. The result is a 40-60% reduction in Redshift compute costs in the first month, without migrating a single table.

This guide walks through the cost reduction strategy, the modern AI features that make Dremio more than a query accelerator, and a step-by-step setup.

Where Redshift Money Goes

Redshift compute costs break into two models, and both have structural inefficiencies.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Provisioned Clusters: Paying for Capacity You Don't Use

Provisioned Redshift clusters bill by the hour based on node type and count. A typical production cluster runs 3 ra3.xlplus nodes at $1.086/hour each, totaling $3.26/hour or approximately $2,350/month running 24/7.

The problem: most analytics workloads are bursty. Dashboards fire queries during business hours. ETL runs overnight. Between those peaks, the cluster sits idle, but you still pay. Pausing the cluster saves money but introduces latency when users need it resumed. Resizing a provisioned cluster to handle peak load adds capacity you pay for during off-peak hours.

Serverless: The 60-Second Tax

Redshift Serverless charges per RPU-hour (approximately $0.375/RPU-hour in us-east-1) with a minimum of 4 RPUs. The catch is the 60-second minimum billing per query. A dashboard query that takes 3 seconds still bills for 60 seconds of RPU compute.

For BI workloads, this 60-second minimum is expensive. If a user refreshes a dashboard with 8 charts, each chart fires a query. Even if each query runs in 5 seconds, you are billed for 8 full minutes of RPU compute (8 queries × 60 seconds). Multiply that across 50 users opening dashboards throughout the day and the serverless bill inflates rapidly.

The Full Cost Picture

Cost ComponentProvisionedServerlessImpact
Idle computeCluster runs 24/7, idle nights and weekendsNo idle charges (advantage)Provisioned wastes 40-60% of capacity for bursty workloads
Short query taxMinimal (cluster already running)60-second minimum per queryServerless inflates BI dashboard costs 10-20x
Concurrency Scaling1 free hour/day, then on-demand ratesIncluded in RPU-hourProvisioned sees unpredictable peaks
Redshift Spectrum$5/TB scanned on S3Included in RPU-hourProvisioned adds S3 query costs
Reserved Instance discountUp to 75% with 3-year commitmentUp to 45% with 3-year reservationBoth require long-term commitment to save
Storage (RMS)$0.024/GB-month$0.024/GB-monthSame for both

How Dremio Stops the Redshift Meter

Dremio connects to Redshift as a federated data source. When you enable Reflections, Dremio queries Redshift once to build an optimized Apache Iceberg copy of the data on your S3 storage. Every subsequent matching query is served by Dremio's engine. Redshift never sees the query. No RPU hours consumed. No cluster cycles used.

Cost Reduction Scenario: 50 Active Dashboard Users

Before Dremio (Provisioned Cluster):

  • 3× ra3.xlplus nodes running 24/7
  • Monthly compute: $2,350
  • Concurrency Scaling during peak hours: $400/month
  • Redshift Spectrum for S3 queries: $200/month
  • Total: $2,950/month

Before Dremio (Serverless):

  • 50 users × 10 dashboard refreshes/day × 8 queries/refresh
  • 4,000 queries/day × 60-second minimum × 8 RPU base
  • Approximately $2,700/month in RPU-hours
  • Total: $2,700/month

After Dremio with Autonomous Reflections:

  • 80% of dashboard queries served from Reflections (no Redshift compute)
  • Redshift handles remaining 20% of queries
  • Provisioned: Downsize to 2 nodes or switch to Serverless for remaining load = $600-900/month
  • Dremio Cloud cost (DCUs + cloud infrastructure): $800-1,200/month
  • Total: $1,400-2,100/month (30-50% savings)

Note: Dremio Cloud pricing includes both Dremio Compute Units (DCUs) and the underlying cloud infrastructure. Even accounting for both, the total is significantly less than the Redshift spend it replaces.

Step-by-Step: Connect Redshift to Dremio

Step 1: Create a Free Dremio Cloud Account

Sign up at Dremio Cloud. The 30-day free trial includes all features including Autonomous Reflections and AI capabilities.

Step 2: Add Redshift as a Source

In the Dremio console, go to SourcesAdd SourceAmazon Redshift.

Configure the connection:

  • Host: Your Redshift cluster endpoint or Serverless workgroup endpoint
  • Port: 5439 (default)
  • Database: The Redshift database containing your analytics tables
  • Username/Password: A Redshift user with SELECT privileges on the target tables

Dremio connects and catalogs your Redshift tables immediately.

Step 3: Build the Semantic Layer

Create virtual datasets (views) that encode your business logic:

CREATE VDS analytics.monthly_revenue AS

SELECT

  DATE_TRUNC('month', order_date) AS order_month,

  sales_region,

  product_line,

  SUM(line_total) AS revenue,

  COUNT(DISTINCT order_id) AS order_count,

  COUNT(DISTINCT customer_id) AS active_customers

FROM redshift_source.public.order_lines ol

JOIN redshift_source.public.customers c ON ol.customer_id = c.id

GROUP BY 1, 2, 3

This view becomes the governed, documented data asset that BI tools and AI agents query. Dremio's AI-generated metadata feature can auto-document these views with descriptions and labels, saving hours of manual cataloging.

Step 4: Enable Reflections

Open the virtual dataset and navigate to the Reflections tab:

  1. Enable Raw Reflections to cache the full result set with optimized sort and partition orders. Best for queries that filter or scan rows.
  2. Enable Aggregate Reflections to pre-compute SUM, COUNT, and AVG aggregations. Best for dashboard queries and summary reports.

Which type to create depends on how the data is used. Create the type that matches your dominant query patterns, or both if usage is mixed.

Dremio reads from Redshift once, builds the Reflection as an Apache Iceberg table on S3, and then serves all matching queries locally.

Step 5: Turn On Autonomous Reflections

Navigate to Project SettingsReflections → Enable Autonomous Reflections.

Dremio now monitors your query patterns over 7 days and automatically creates, manages, and drops Reflections based on what your users actually run.

Important: Autonomous Reflections only consider datasets natively stored as Iceberg (or UniForm tables, Parquet datasets, and views built on them). While all Reflections are internally stored as Iceberg tables, those internal tables are not exposed from Dremio's catalog. Only your natively-stored Iceberg datasets are eligible for autonomous optimization. This gives you another reason to migrate data from Redshift to Iceberg over time: it unlocks autonomous performance management.

Dremio is an Iceberg-native lakehouse platform with a built-in Open Catalog based on Apache Polaris. Your Iceberg data is stored on your own S3 and accessible to any Iceberg-compatible engine (Spark, Flink, Trino). Dremio autonomously manages query performance without creating engine lock-in.

The system provisions a dedicated small engine for Reflection maintenance that auto-suspends after 30 seconds of idle time.

Dremio's AI Features: Beyond Query Offloading

AI SQL Functions

Run classification, generation, and summarization directly in SQL without exporting data from your analytics environment:

-- Detect PII in customer support tickets stored in Redshift

SELECT

  ticket_id,

  ticket_text,

  AI_CLASSIFY(ticket_text,

    ARRAY['contains_pii', 'no_pii']) AS pii_flag

FROM analytics.support_tickets

WHERE created_date > CURRENT_DATE - INTERVAL '30' DAY

With Redshift alone, this requires building an external ML pipeline: export data → process in SageMaker or a Lambda function → load results back. With Dremio, it is one SQL query.

The AI Agent

Dremio's built-in AI Agent handles the full analytical workflow:

  • Discover: Browse your Redshift tables and Dremio views using natural language. The Agent uses the semantic layer (virtual datasets, wikis, and labels) to understand your business terminology.
  • Analyze: Ask business questions in plain English and get SQL + results. The Agent writes the query, executes it, and presents the output without the user touching SQL.
  • Visualize: Generate charts directly in the Dremio console. The Agent can create bar charts, line charts, and tables from query results in a single conversational turn.
  • Optimize: The Agent can review slow queries, identify bottlenecks, and suggest performance improvements. It can also analyze past job history to surface recurring inefficiencies.

This reduces the number of ad-hoc Redshift queries because analysts self-serve through the Agent, which runs against Reflections rather than Redshift. Every question answered by the Agent is a query that did not consume RPU-hours or cluster compute.

Results Cache

Beyond Reflections, Dremio also maintains a Results Cache that automatically stores and reuses query results. If the same query is run again and the underlying data has not changed, Dremio returns the cached result instantly. This eliminates redundant computation for repetitive dashboard queries and further reduces the load on both Dremio's engine and any federated sources. Combined with Autonomous Reflections, the Results Cache creates a multi-layered acceleration strategy that maximizes cache hit rates across your analytical workload.

AI-Generated Metadata

Dremio uses generative AI to auto-document your data catalog. By sampling table schemas and data, Dremio generates descriptions (wikis) and suggests tags (labels) for every dataset. This is particularly valuable for Redshift migrations because it builds the documentation that governance teams need without the weeks of manual effort that would normally be required.

MCP Server and External AI Integration

Dremio's MCP Server lets external AI tools (ChatGPT, Claude, custom agents) connect to your Dremio environment. Those AI tools query Dremio's Reflections, not Redshift. Every AI-generated query that hits Dremio instead of Redshift is a query you did not pay Redshift to process.

Architecture: How Dremio and Redshift Work Together

┌─────────────────────────────────────────────────────┐

│          BI Tools / AI Agents / Data Apps            │

│     (QuickSight, Tableau, Power BI, ChatGPT)        │

└──────────────────────┬──────────────────────────────┘

                       │

                       ▼

┌─────────────────────────────────────────────────────┐

│                   Dremio Cloud                       │

│                                                      │

│  ┌─────────────┐  ┌───────────────┐  ┌────────────┐ │

│  │  Semantic    │  │  Autonomous   │  │ AI Agent + │ │

│  │  Layer       │  │  Reflections  │  │ AI SQL     │ │

│  │  (Views +    │  │  (7-day       │  │ Functions  │ │

│  │  AI-gen      │  │  pattern      │  │            │ │

│  │  metadata)   │  │  learning)    │  │            │ │

│  └─────────────┘  └──────┬────────┘  └────────────┘ │

│                          │                           │

│  ┌───────────────────────▼─────────────────────────┐ │

│  │    Apache Iceberg on S3 (Your Account)          │ │

│  │    Reflections + migrated data                  │ │

│  └─────────────────────────────────────────────────┘ │

└──────────────────────┬──────────────────────────────┘

                       │ Only cache-miss

                       │ queries

                       ▼

┌─────────────────────────────────────────────────────┐

│            Amazon Redshift                           │

│    (Provisioned or Serverless, reduced load)         │

└─────────────────────────────────────────────────────┘

The Long-Term Path: From Cost Reduction to Full Modernization

Phase 1 is query offloading and Reflections. Phase 2 is migrating data from Redshift to Apache Iceberg tables on S3.

Dremio's Open Catalog, built on Apache Polaris, manages your Iceberg tables with automatic maintenance: compaction, vacuuming, manifest rewriting, and sort ordering run in the background. Migrating to Iceberg is also what enables Autonomous Reflections, which only consider datasets natively stored as Iceberg. Once your data lives in Iceberg, you can query it with Dremio, Spark, Flink, or any Iceberg-compatible engine. Redshift can also query Iceberg tables via Spectrum, so the same data is accessible from both platforms. The Open Catalog uses the standard Iceberg REST protocol, so other engines can write to the same tables that Dremio is autonomously optimizing. No vendor lock-in.

PhaseActionRedshift Impact
Phase 1Connect Redshift + enable Reflections40-60% compute reduction
Phase 2Migrate cold/warm tables to Iceberg on S3Eliminate Redshift storage costs
Phase 3Downsize or decommission Redshift clusterEliminate Redshift compute costs

The migration is incremental. You do not need to move everything at once. Start with the most expensive tables (highest query volume) and work outward.

What Redshift Does Not Provide

CapabilityRedshiftDremio
Autonomous query accelerationNo built-in equivalentAutonomous Reflections learn from 7-day query patterns
AI SQL functionsLimited ML functions via Redshift MLAI_CLASSIFY, AI_GENERATE, AI_COMPLETE (LLM-native)
Built-in AI agentNoneFull analytical co-pilot with charts and recommendations
Federated queriesLimited to Redshift Spectrum (S3) and DataSharesQuery Redshift + S3 + PostgreSQL + MongoDB + Snowflake + more
Open data formatProprietary internal formatNative Apache Iceberg with full multi-engine interoperability
Self-documenting catalogManual Redshift Data Sharing/Glue descriptionsAI-generated wikis and labels on every dataset
Per-second billing (no minimum)60-second minimum (Serverless)Consumption-based with no artificial minimums

Get Started

  1. Sign up for Dremio Cloud: https://www.dremio.com/get-started
  2. Connect Redshift and enable Reflections on your busiest dashboard tables
  3. Monitor your AWS Cost Explorer for Redshift spend over the first week

The Autonomous Reflections feature means savings compound automatically. As Dremio learns your query patterns, it creates optimized Reflections without manual tuning. Most teams measure savings within 7 days.

For documentation and tutorials, visit Dremio Docs or take free courses at Dremio University.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.