37 minute read · April 4, 2025

From SQL Server to Lakehouse: A Better Journey to an Apache Iceberg Lakehouse

Alex Merced

Alex Merced · Head of DevRel, Dremio

The story probably sounds familiar if you’ve been building data systems for a while. Your company starts with an operational database—SQL Server, PostgreSQL, MySQL, maybe even MongoDB. It serves its purpose well. Transactions are fast, applications are happy, and reports run—well, they used to run—just fine.

But as the business grows, so do the data demands.

Suddenly, dashboards take forever to load. Analysts complain about query timeouts. You’re afraid to run a heavy report during business hours because it might slow down the entire production system. And you find yourself duct-taping solutions—read replicas, exports to Excel, overnight ETL jobs—just to keep the lights on.

So, what’s the next step? For many, it’s a cloud data warehouse—Snowflake, Synapse, Redshift—a platform built for analytics at scale. It seems like a natural progression. But here’s the twist: that move often introduces new challenges. Now you’re juggling complex pipelines, ballooning warehouse costs, and a sense of déjà vu as your shiny new analytics system starts showing signs of strain.

Today, a new architectural shift is underway—Lakehouses, built on open table formats like Apache Iceberg, are emerging as the future of scalable, flexible analytics.

And if you’re already feeling the pain of scaling SQL Server (or any OLTP database) for analytics, here’s the good news:

You don’t have to go through the data warehouse detour.

This blog explores the journey from SQL Server to a modern lakehouse, and how a platform like Dremio can guide you step-by-step—from querying your operational databases directly to progressively migrating to Iceberg tables—without hitting a dead end along the way.

Life with SQL Server: When OLTP Becomes a Bottleneck

Let’s zoom in on a familiar stage in the data journey: when SQL Server (or any OLTP system) starts showing its limits.

When you first roll out your operational database, everything feels in control. It's great for fast inserts, updates, and deletes, and your application teams are happy. Maybe you're even running a few lightweight reports directly on production without any real issues.

But it doesn’t take long before things start to shift.

The Analytics Load Creeps In

It usually starts innocently:

  • A product manager asks for a monthly sales dashboard.
  • The finance team wants a real-time revenue report.
  • Marketing needs customer segmentation data.

Now you’ve got complex JOINs, GROUP BYs, and aggregations running on a system that was never meant for that load. These queries start competing with your operational workload, and you begin to see the cracks:

  • Query latency spikes.
  • BI dashboards time out or display stale data.
  • Developers are told, “Don’t run that query until after hours.”

Sound familiar?

Workarounds, Not Solutions

To deal with the pressure, many teams implement stopgaps:

  • Read replicas to offload queries—but this just shifts the load without solving data modeling limitations.
  • Extracts to Excel or Power BI—which quickly become unmanageable and error-prone.
  • Custom ETL pipelines—that dump data nightly into flat files or cloud storage for batch processing.

Each solution adds complexity, introduces latency, and often leads to inconsistent, stale, or duplicated data across teams.

Meanwhile, you’re investing more engineering time to keep your analytics duct-taped together.

The Tipping Point

Eventually, it becomes clear: OLTP just isn’t built for large-scale analytics. It’s a transactional workhorse, not an analytical engine. And now, the organization is looking for a real solution—something scalable, flexible, and purpose-built for BI and analytics.

That’s when most teams consider leaping a data warehouse.

But as we’ll see in the next section, that leap isn’t always the long-term fix it promises to be.

The Traditional Move: Enter the Data Warehouse

Once SQL Server—or any OLTP database—starts buckling under the weight of analytics, the natural response is to look for something built for analytical workloads. Enter the cloud data warehouse.

Platforms like Snowflake, Amazon Redshift, and Azure Synapse Analytics are often the go-to choice. And it’s easy to see why. They offer:

  • Massive parallelism for faster queries
  • Separation of storage and compute for (supposedly) cost-effective scaling
  • Columnar storage for efficient analytics
  • Integration with BI tools and ETL platforms

So, you stand up a warehouse, fire up your favorite ETL/ELT tool, and start copying data from SQL Server. At first, it feels like a win: dashboards are faster, analysts are happy, and you finally have a centralized source of truth.

But then… new challenges start to creep in.

New Platform, New Problems

As your data volume and user base grow, some of those initial benefits unravel.

Cost Surprises

Cloud data warehouses charge based on compute time—and if you’re not careful, those costs can spiral fast. Long-running queries, multiple users, and auto-scaling compute clusters can lead to unexpected bills at the end of the month.

Modeling Rigidities

Most warehouses still push a traditional star or snowflake schema mindset. While that’s fine for reporting, it can become limiting when you need flexible exploration, data science, or ad hoc queries across semi-structured data.

Multiple Silos for Different Use Cases

Want to do machine learning? You’re exporting data to a notebook. Need streaming ingestion? Time to set up a separate real-time pipeline. The warehouse becomes just one of many data stores in your ecosystem, and data fragmentation rears its head again.

Vendor Lock-In

Proprietary features, UDFs, and SQL dialects tie your logic and workflows tightly to a single vendor. Migrating or integrating with other systems becomes difficult—sometimes impossible.

The Warehouse Is a Step Forward—But Not the Finish Line

To be clear, cloud data warehouses represent a significant step up from trying to do analytics on a transactional database. But they come with new architectural tradeoffs. And more and more teams are realizing that while a warehouse improves some things, it still doesn’t provide the flexibility, openness, and multi-modal access that modern data teams need.

That’s why the industry is now turning its attention to the lakehouse. This new approach combines the performance of a warehouse with the openness and scalability of a data lake.

The Next Evolution: Apache Iceberg and the Rise of the Lakehouse

If the data warehouse was a necessary stepping stone, the lakehouse is the destination many teams have been waiting for.

At its core, lakehouse architecture combines the best of both worlds: the scalability and flexibility of a data lake with the performance, governance, and structure of a data warehouse.

The real magic behind this evolution lies in open table formats, most notably Apache Iceberg.

What Is a Lakehouse?

Think of a lakehouse as a unified data platform where:

  • You store all your data—structured, semi-structured, even unstructured—in open formats on low-cost object storage (like S3, ADLS, or GCS)
  • You run fast, SQL-based analytics directly on that data, without needing to copy it into a warehouse
  • You support BI, machine learning, streaming, and data science all from the same foundation

No more bouncing between different systems with possibly inconsistent copy of the same data for various workloads. No more creating brittle pipelines just to make data “fit” somewhere.

It’s a single source of truth approach—and Apache Iceberg is what makes it possible.

Why Apache Iceberg Changes the Game

Apache Iceberg is a high-performance, open table format designed specifically for the lakehouse era. It brings warehouse-like capabilities to your data lake.

Here’s what makes it special:

ACID Transactions

Iceberg guarantees consistency, even across complex multi-file updates, inserts, and deletes. No more brittle “eventually consistent” hacks in your data lake.

Schema Evolution & Time Travel

Evolve your tables over time—add or remove columns, change types, and even query past versions of data with zero hassle.

Partitioning Without the Pain

Iceberg handles intelligent partitioning under the hood. No need to manually engineer partitions or worry about managing directory structures like in traditional Hive tables.

Engine Agnostic

Iceberg works with many query engines: Dremio, Spark, Flink, Trino, Snowflake and more. It frees you from vendor lock-in.

Metadata-Driven Performance

Iceberg tracks rich metadata about your data files, which enables powerful query pruning and fast analytics—even on petabyte-scale datasets.

From Warehouse to Lakehouse: A Better Fit for Modern Analytics

While cloud data warehouses helped solve some early scaling problems, they still operate in closed, siloed ecosystems. In contrast, a lakehouse built on Apache Iceberg gives you:

  • Open file formats (like Parquet or ORC)
  • Decoupled compute and storage
  • Freedom to use multiple engines for different workloads
  • Support for BI, ML, and streaming use cases—all in one place

This isn’t just a new tool—it’s a fundamental shift in how we think about building analytics platforms.

But here’s the key insight: you don’t need to migrate to a warehouse first to get here. You can leapfrog straight from your OLTP systems—like SQL Server—directly to a lakehouse.

And with Dremio, that leap becomes a smooth, incremental journey rather than a risky all-or-nothing migration.

Why You Can Skip the Warehouse Step

It’s easy to fall into the trap of thinking that a data warehouse is a mandatory stop on the road to modern analytics. But the truth is, you don’t have to make that pit stop at all.

If you’re already starting to feel the limitations of your OLTP systems like SQL Server, you’re in the perfect position to do what more and more forward-thinking organizations are doing:

Skip the warehouse. Go straight to the lakehouse.

Here’s why that’s not just possible—it’s smart.

Replatforming Twice Is a Costly Detour

Standing up a data warehouse is a significant investment in time, engineering effort, and money. You’ll build pipelines, rewrite queries, and redesign models—all to solve analytics problems in a system that, eventually, you’ll likely outgrow again.

Why spend months (or years) migrating to a warehouse, only to migrate again later to a lakehouse when your needs evolve?

Warehouses Still Create Silos

Despite their power, data warehouses still force a divide between systems:

  • Your operational data lives in SQL Server or Postgres.
  • Your analytical data lives in the warehouse.
  • Your machine learning features live in files or notebooks.
  • Your streaming data lives somewhere else entirely.

Lakehouses unify all of this—on open storage, with open formats, and open compute engines.

Flexibility and Openness from Day One

With Iceberg-based lakehouses, your data lives in open formats (like Parquet) on object storage. That means:

  • You’re not locked into a single vendor.
  • You can use the best engine for the job—whether that’s Dremio, Spark, Flink, or something else.
  • You can build once and reuse across BI, ML, and real-time.

Compare that to a warehouse, where SQL dialects, storage layers, and compute models often force you into proprietary ecosystems with limited escape hatches.

The Takeaway

If you're already seeing the strain on your OLTP systems, there's no need to “graduate” to a data warehouse just to eventually evolve to a lakehouse.

That warehouse step is becoming the new legacy—a transitional phase that can now be skipped entirely.

The modern approach is to leapfrog directly to a lakehouse—and with Dremio, you don’t even need to do it all at once.

In the next section, we’ll look at how Dremio helps you begin the lakehouse journey while still connected to your SQL Server (or other OLTP databases)—and guides you, step by step, into the Iceberg-powered future.

Dremio: Your Companion on the SQL Server to Lakehouse Journey

Here’s where the story gets exciting.

Most teams hesitate to jump straight to a lakehouse because they assume it requires a big bang migration—a full data lift-and-shift, new tooling, and a complete rewrite of analytics workflows.

But that’s no longer the case.

Dremio makes it possible to start your lakehouse journey from where you are today—even if that’s still deeply rooted in SQL Server, Postgres, MySQL, or MongoDB—and progress at your own pace toward a modern Apache Iceberg lakehouse.

Let’s break down how it works.

Start with What You Have: Direct Queries on OLTP Systems

Dremio connects natively to your existing OLTP databases—like:

  • Microsoft SQL Server
  • PostgreSQL
  • MySQL
  • MongoDB

Right out of the gate, you can use Dremio to:

  • Query these sources directly, using familiar SQL
  • Join across them without first extracting or transforming
  • Expose a semantic layer for analysts and BI tools

That means no more dumping data into files or shoving it prematurely into a warehouse just to get insights. You can empower analytics teams without moving a single byte.

Define a Semantic Layer: Business-Friendly, Reusable Views

Dremio lets you build a semantic layer—a virtual data model on top of your raw sources. This means:

  • Creating logical datasets that represent clean, curated views of the data
  • Abstracting away complexity so that analysts don’t need to know table names, join conditions, or transformations
  • Enabling self-service analytics through tools like Tableau, Power BI, or even SQL editors

The semantic layer bridges your raw systems and the business-facing reports, bringing clarity and consistency to every dashboard and query.

Accelerate Performance Instantly with Reflections

Are you worried about performance when querying live databases? Dremio’s Reflections are here to help.

Reflections are smart, automatically managed iceberg-based materializations of your virtual datasets. Think of them as transparent accelerators: they dramatically improve query performance without changing your workflows.

  • Dremio chooses when to use them
  • You don’t have to rewrite queries
  • They’re automatically refreshed and optimized

This means blazing-fast dashboards and reports, even when pulling from multiple operational databases or complex transformations.

Migrate to Iceberg Tables—When You’re Ready

Here’s where Dremio sets itself apart.

Once you’ve stabilized your semantic layer and understand what data drives value, you can start migrating data to Apache Iceberg tables—gradually, and at your own pace.

  • No need for a disruptive replatforming
  • Start with high-value or high-volume datasets
  • Continue querying across hybrid sources (OLTP + Iceberg) seamlessly

And because Dremio supports data lake storage on systems like S3, ADLS, and GCS, your data now lives in open, scalable, cost-effective infrastructure.

Managed Iceberg Services: Operational Simplicity

Dremio doesn’t just support Apache Iceberg—it makes it easy.

Behind the scenes, Dremio provides automated services to manage and optimize your Iceberg tables:

  • Compaction and file optimization to improve performance
  • Integrated Apache Iceberg Catalog to keep queries fast

You get the benefits of a finely-tuned lakehouse, without having to build the tuning yourself.

📈 Performance and Cost Optimization, End to End

Once your data lives in Iceberg tables on a lakehouse architecture, Dremio takes over the heavy lifting:

  • Automatically optimized queries using Reflections
  • Intelligent caching and metadata pruning
  • Efficient resource usage with decoupled compute

Your data stack becomes faster, cheaper, and more flexible—and it’s fully open, with no vendor lock-in.

With Dremio, you don’t need to leap blindly into the unknown. You can start small, grow gradually, and evolve into a lakehouse-powered organization with full confidence.

A Progressive Path: Step-by-Step Journey with Dremio

Making the leap from SQL Server to a modern lakehouse doesn’t have to be disruptive or intimidating. With Dremio, it’s not an all-or-nothing migration—it’s a progressive transformation, designed to deliver value at every stage.

Let’s walk through what that journey typically looks like:

Step 1: Connect to Your Existing OLTP Systems

You start by connecting Dremio to your current data sources:

  • SQL Server, PostgreSQL, MySQL, or MongoDB
  • Existing Data Warehouses & Data Lakes

There’s no ETL required upfront—you can begin querying live data across these systems immediately.

Step 2: Define Your Semantic Layer

Next, you build a semantic layer using Dremio’s intuitive interface or SQL editor:

  • Create virtual datasets (VDS) that join, filter, and transform data from your raw sources
  • Model your business entities—customers, orders, inventory—as logical views
  • Organize your semantic layer in folders and projects for easy collaboration and governance

This becomes the foundation for self-service analytics—one source of truth, no data movement required.

Step 3: Accelerate with Reflections

With the semantic layer in place, you apply Reflections to the most commonly queried datasets:

  • Think of Reflections as automated materializations behind the scenes
  • They drastically reduce query latency and offload pressure from your source systems
  • Users don’t need to know they exist—they just experience fast results

This is a game-changer: your existing data sources become lightning-fast without needing to move data to a warehouse.

Step 4: Begin Migrating Key Datasets to Apache Iceberg

Once you understand which datasets drive the most value—or incur the most performance cost—you can start progressively migrating them to Apache Iceberg on your data lake:

  • Choose a cloud object store (e.g., S3, ADLS, GCS)
  • Use Dremio or your preferred tool to write the original raw dataset to an Iceberg table on your data lake
  • Continue querying both source data and Iceberg tables side-by-side
  • Then, when everything is in syn,c update your raw layer view SQL to point to the Iceberg dataset
  • All views built on top of those raw views will automatically be using the new data location and benefitting from Dremio's Iceberg value adds like autonomous performance management and results caching.

You maintain business continuity while gaining the benefits of open, performant data storage.

Step 5: Let Dremio Manage Optimization for You

As your Iceberg footprint grows, Dremio takes on more of the optimization behind the scenes:

  • Automatic compaction and metadata optimization
  • Automatic Cleanup of your lakehouse based on your data retention policies
  • Autonomous Performance Management automatically managing where you need reflections and where you don't

What was once a complex engineering task becomes a fully managed, transparent experience.

Step 6: Scale Up Analytics Across the Business

Now you’ve got a lakehouse: open, fast, and fully self-service.

At this stage, you can:

  • Onboard BI tools like Tableau, Power BI, or Looker
  • Enable data scientists to run exploratory analysis and train models directly on Iceberg data
  • Ingest real-time and batch data pipelines into your data lake, with one consistent format

And the best part? You did it without disruptive re-platforming or costly intermediate steps.

This step-by-step journey transforms your analytics stack from a patchwork of siloed systems into a modern, open, and scalable lakehouse architecture—all while delivering value at every stage.

Real-World Results: What You Gain by Going Direct to Lakehouse

Taking the direct path from SQL Server to Lakehouse isn’t just a cleaner architecture—it delivers real, measurable results for teams and organizations across industries. By skipping the warehouse detour and embracing an open data strategy with Dremio and Apache Iceberg, companies are seeing improvements in speed, flexibility, cost, and agility.

Let’s break down the gains.

Dramatically Faster Analytics

By leveraging Dremio’s Reflections and metadata-aware query engine, teams reduce query times from minutes to seconds, even while querying operational systems.

And as data is progressively moved into Iceberg tables, performance gets even better:

  • Automatic Iceberg Table Optimization
  • Automatic Iceberg Table Clean-Up
  • Automatic Acceleration of Datasets based on query patterns
  • Automatic and Incremental Updates of Reflections on Iceberg Tables

All of this adds up to instantaneous dashboards, snappier ad hoc queries, and more responsive exploration for analysts.

Lower Cost of Ownership

Traditional data warehouses often charge by compute usage, and costs can spiral as more users query more data.

With a lakehouse powered by Apache Iceberg and Dremio:

  • You store data once, in open formats, on low-cost object storage
  • You eliminate the need to copy data across systems (no more “staging to Snowflake” or building redundant ETL pipelines)
  • You reduce license and infrastructure overhead by consolidating tools

And because Dremio decouples storage from compute, you can scale query engines independently and optimize for cost without sacrificing performance.

Unified Data for All Use Cases

In a typical warehouse-driven architecture, BI teams work in one system, data scientists in another, and streaming engineers in yet another. That leads to data duplication, inconsistencies, and operational friction.

With a lakehouse:

  • BI analysts, ML engineers, and data scientists all work from the same tables, using the same formats
  • No data movement is required to support new use cases
  • You support batch, real-time, and interactive workloads side-by-side

Everyone speaks the same data language—and innovation moves faster.

Freedom from Lock-In

Organizations free themselves from proprietary vendor ecosystems by embracing open formats like Apache Iceberg and storing data on cloud-native object storage.

That means:

  • No more SQL dialect surprises when switching engines
  • Portability of data and logic across tools like Spark, Flink, Trino, and of course, Dremio
  • Future-proofing your stack so you're not locked into yesterday’s tools

Your data becomes a strategic asset, not a liability tied to a single vendor.

Faster Time to Insight, Lower Risk

Most importantly, organizations that go direct to lakehouse with Dremio are able to deliver value faster—without high-risk replatforming.

They can:

  • Begin delivering insights within days, not months
  • Move incrementally, with minimal disruption to existing systems
  • Align their data architecture with where the industry is heading, not where it’s been

It’s a transformation that fits real-world constraints—and delivers enterprise-ready capabilities along the way.

Conclusion: The Smart Leap Forward

You're not alone if you're currently stretching SQL Server—or any OLTP database—beyond its intended purpose to keep up with analytics demand. This pain point is shared by countless organizations as data volumes grow, dashboards become more complex, and business expectations rise.

Traditionally, the next step seemed obvious: move to a cloud data warehouse. But as we’ve seen, that path often introduces new complexities, costs, and constraints—and ultimately leads to another migration down the road.

Today, there’s a smarter, more modern alternative:
Go straight to a lakehouse.

With Apache Iceberg as the foundation and Dremio as your guide, you don’t need to rip, replace, or commit to a high-risk replatforming effort. Instead, you can:

  • Connect to your existing systems (like SQL Server) instantly
  • Build a semantic layer for consistent, self-service analytics
  • Accelerate performance without moving data
  • Gradually migrate to open, Iceberg-based tables at your own pace
  • Unify all your data use cases—BI, ML, and streaming—on a single, open platform

The result?
A faster, more flexible, and cost-efficient analytics stack that scales with your business—and puts you in control of your data strategy.

So if you're feeling the limitations of your current systems, don’t settle for a temporary fix.
Skip the warehouse. Embrace the lakehouse. Let Dremio show you the way.

Want to learn how you can benefit from this strategy, schedule a free architectural workshop today!

Sign up for AI Ready Data content

Achieve More with the: Accelerate Results with AI-Ready, Curated Datasets

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.