Dremio Blog

9 minute read · June 26, 2026

Dremio ELT: Load, Transform, and Govern Data Without Leaving the Lakehouse

Mark Shainman Principal Product Marketing Manager

Start For Free

Copied to clipboard

Dremio ELT: Load, Transform, and Govern Data Without Leaving the Lakehouse

The Problem With Traditional Data Pipeline Approaches

How Dremio Handles Ingestion

How Dremio Handles Transformation

What Changes When You Add Claude Code to the Workflow

Governance Built Into the Pipeline

A Lakehouse That Works Across the Entire Data Estate

Data pipelines used to require a lot of infrastructure to keep running: separate compute for transformation, staging layers between systems, and a growing stack of tools to manage it all. Dremio changes the equation. With native ingestion, flexible transformation, and AI-assisted pipeline development, teams can build and operate end-to-end ELT workflows directly in the lakehouse, on open standards, without stitching together point solutions.

The Problem With Traditional Data Pipeline Approaches

For most data teams, "getting data ready to use" involves a long chain of handoffs. Data lands in object storage, gets picked up by an extraction tool, flows through a transformation service, and eventually lands in a destination system where analysts can finally query it. Each step in that chain is a place where things break, costs accumulate, and governance becomes someone else's problem.

The shift to ELT (extract, load, transform) addresses this by collapsing the pipeline. Instead of transforming data in transit, teams load it first and transform it in place, using the same query engine that serves analysts. The result is fewer moving parts, faster iteration, and a cleaner lineage story. But ELT only delivers on that promise if the platform can handle both ends of the workflow natively, with the flexibility to meet different teams where they are.

How Dremio Handles Ingestion

Dremio connects to over 32 Dremio supported data sources, and 29 community supported data sources natively, spanning databases, warehouses, SaaS applications, and cloud storage, and manages the metadata and access controls for all of them from a single catalog. Teams do not have to choose between what lives in the lakehouse and what remains in source systems. Dremio governs all of it.

For files landing in object storage, Dremio provides two SQL-native ingestion commands that write directly into Apache Iceberg tables. COPY INTO loads files from a specified path in a single operation. CREATE PIPE sets up a continuous ingestion pipeline that picks up new files automatically as they arrive, without requiring a separate orchestration layer. Both commands land data into Iceberg without standing up additional ETL tooling.

For data already accessible through Dremio's catalog (connected databases, warehouses, or SaaS sources), CTAS and INSERT INTO let teams move and shape data in a single query. The Kafka integration is worth calling out specifically: Dremio's Kafka connector gives direct access to live Kafka topics, and a CTAS query is all it takes to write streaming data into an Iceberg table. There is no intermediate storage step, no separate streaming job, and no landing zone to manage. For teams already running ingestion pipelines with Fivetran, Airbyte, Confluent, or Spark, Dremio fits cleanly into existing workflows.

How Dremio Handles Transformation

Data transformation is not a one-size-fits-all problem. Engineering teams want version-controlled, code-first workflows. Analysts want visual tools. Some teams need Python. Others need no SQL at all. Dremio is built to accommodate all of these without forcing a single approach.

For engineering teams, Dremio integrates natively with dbt Core, one of the most widely adopted transformation frameworks in the industry. Dremio acts as the execution engine for dbt models, running SQL against Iceberg and managing the resulting tables. Teams that prefer to stay in SQL without the dbt layer can write and execute transformations directly in the Dremio console. Spark and Python workloads are supported via Arrow Flight, Dremio's high-performance data transport protocol, so existing pipelines can connect to Dremio-managed Iceberg tables without significant rework.

For analysts and business users, Dremio's low-code editor provides a visual interface for building transformations: filtering rows, joining tables, renaming columns, and applying functions through a point-and-click interface. Transformation Studio, a free and open-source tool created by Dremio, extends this further, providing a collaborative environment for building and sharing transformation logic before promoting it to production. Native AI Transforms bring AI-assisted data preparation directly into the pipeline, handling tasks like cleaning inconsistent strings or classifying records without requiring custom code.

Transform Studio

What Changes When You Add Claude Code to the Workflow

One of the most practical shifts in data engineering right now is how well AI coding tools fit into ELT workflows. Claude Code, Anthropic's AI coding tool for the terminal, can generate, review, and iterate on Dremio SQL pipelines at speed.

Just as an example, in a live Dremio ELT session, Claude Code worked through a multi-step pipeline targeting retail database tables. It first validated source tables by connecting through Dremio to retail data in PostgreSQL and confirming schema and row counts across customers, transactions, products, and stores. It then generated a CTAS query targeting the retail database table customer_purchase_profile, stored as Iceberg, using CTEs to join and aggregate across the validated sources.

This kind of work used to take hours of trial, error, and documentation reading. With Claude Code connected to a Dremio environment, a data engineer can describe the transformation they need in plain language and get production-ready SQL back in seconds, then iterate on it conversationally until it is right. For teams building medallion architectures (Bronze, Silver, Gold), Claude Code is particularly effective: it understands the layering pattern and can generate the full chain of transformations from raw to curated while incorporating business logic described in natural language.

Governance Built Into the Pipeline

ELT is only as useful as the trust teams can place in the data it produces. Dremio addresses this at the platform level rather than leaving it to individual pipeline authors.

Column-level lineage traces how data moves and transforms across the entire lakehouse, from source table to Gold layer. When a downstream metric breaks, teams can follow the lineage back to the root cause. Apache Iceberg's native time travel and snapshot history give point-in-time queryability and rollback capability out of the box: if a transformation runs incorrectly and corrupts a table, querying the previous snapshot and restoring from it is a standard operation. Automatic table statistics keep query plans optimized and surface data health signals before downstream consumers notice a problem. Versioning and CI/CD support lets engineering teams apply software development practices to data pipelines, with branching, pull requests, and automated deployment for SQL transformations and catalog changes. For teams running observability platforms like Monte Carlo or Great Expectations, Dremio integrates with rather than replaces those tools.

A Lakehouse That Works Across the Entire Data Estate

What makes Dremio's approach to ELT different is the scope of what it governs. Most platforms manage data that lives within their own environment. Dremio federates across the entire data estate, connecting to a 60+ sources and treating all of them as first-class citizens in a single catalog, regardless of where the data physically lives. Teams do not have to migrate everything into the lakehouse to get the benefits of unified governance, query optimization, and lineage. They can govern what is in Iceberg and what remains in source systems from the same place, with the same access controls, and the same lineage graph.

This is the core of the Dremio vision: enable business from data to action. Accelerating the ability to get to trusted insights, faster business outcomes and faster decisions. Dremio is the only data platform built for agents, managed by agents, running on open standards like Apache Iceberg and Apache Arrow so that nothing is locked to a single vendor.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Various Insights

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.

Alex Merced

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Oct 12, 2023 Product Insights from the Dremio Blog

Table-Driven Access Policies Using Subqueries

This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.

Albert Vernon

Dremio ELT: Load, Transform, and Govern Data Without Leaving the Lakehouse

Table of Contents

The Problem With Traditional Data Pipeline Approaches

How Dremio Handles Ingestion

How Dremio Handles Transformation

What Changes When You Add Claude Code to the Workflow

Governance Built Into the Pipeline

A Lakehouse That Works Across the Entire Data Estate

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

The Problem With Traditional Data Pipeline Approaches

How Dremio Handles Ingestion

How Dremio Handles Transformation

What Changes When You Add Claude Code to the Workflow

Governance Built Into the Pipeline

A Lakehouse That Works Across the Entire Data Estate

Try Dremio Cloud free for 30 days

Related Dremio Articles

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

Table-Driven Access Policies Using Subqueries

Ready to Get Started?