Data pipelines used to require a lot of infrastructure to keep running: separate compute for transformation, staging layers between systems, and a growing stack of tools to manage it all. Dremio changes the equation. With native ingestion, flexible transformation, and AI-assisted pipeline development, teams can build and operate end-to-end ELT workflows directly in the lakehouse, on open standards, without stitching together point solutions.
The Problem With Traditional Data Pipeline Approaches
For most data teams, "getting data ready to use" involves a long chain of handoffs. Data lands in object storage, gets picked up by an extraction tool, flows through a transformation service, and eventually lands in a destination system where analysts can finally query it. Each step in that chain is a place where things break, costs accumulate, and governance becomes someone else's problem.
The shift to ELT (extract, load, transform) addresses this by collapsing the pipeline. Instead of transforming data in transit, teams load it first and transform it in place, using the same query engine that serves analysts. The result is fewer moving parts, faster iteration, and a cleaner lineage story. But ELT only delivers on that promise if the platform can handle both ends of the workflow natively, with the flexibility to meet different teams where they are.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
How Dremio Handles Ingestion
Dremio connects to over 32 Dremio supported data sources, and 29 community supported data sources natively, spanning databases, warehouses, SaaS applications, and cloud storage, and manages the metadata and access controls for all of them from a single catalog. Teams do not have to choose between what lives in the lakehouse and what remains in source systems. Dremio governs all of it.
For files landing in object storage, Dremio provides two SQL-native ingestion commands that write directly into Apache Iceberg tables. COPY INTO loads files from a specified path in a single operation. CREATE PIPE sets up a continuous ingestion pipeline that picks up new files automatically as they arrive, without requiring a separate orchestration layer. Both commands land data into Iceberg without standing up additional ETL tooling.
For data already accessible through Dremio's catalog (connected databases, warehouses, or SaaS sources), CTAS and INSERT INTO let teams move and shape data in a single query. The Kafka integration is worth calling out specifically: Dremio's Kafka connector gives direct access to live Kafka topics, and a CTAS query is all it takes to write streaming data into an Iceberg table. There is no intermediate storage step, no separate streaming job, and no landing zone to manage. For teams already running ingestion pipelines with Fivetran, Airbyte, Confluent, or Spark, Dremio fits cleanly into existing workflows.
How Dremio Handles Transformation
Data transformation is not a one-size-fits-all problem. Engineering teams want version-controlled, code-first workflows. Analysts want visual tools. Some teams need Python. Others need no SQL at all. Dremio is built to accommodate all of these without forcing a single approach.
For engineering teams, Dremio integrates natively with dbt Core, one of the most widely adopted transformation frameworks in the industry. Dremio acts as the execution engine for dbt models, running SQL against Iceberg and managing the resulting tables. Teams that prefer to stay in SQL without the dbt layer can write and execute transformations directly in the Dremio console. Spark and Python workloads are supported via Arrow Flight, Dremio's high-performance data transport protocol, so existing pipelines can connect to Dremio-managed Iceberg tables without significant rework.
For analysts and business users, Dremio's low-code editor provides a visual interface for building transformations: filtering rows, joining tables, renaming columns, and applying functions through a point-and-click interface. Transformation Studio, a free and open-source tool created by Dremio, extends this further, providing a collaborative environment for building and sharing transformation logic before promoting it to production. Native AI Transforms bring AI-assisted data preparation directly into the pipeline, handling tasks like cleaning inconsistent strings or classifying records without requiring custom code.
Transform Studio
What Changes When You Add Claude Code to the Workflow
One of the most practical shifts in data engineering right now is how well AI coding tools fit into ELT workflows. Claude Code, Anthropic's AI coding tool for the terminal, can generate, review, and iterate on Dremio SQL pipelines at speed.
Just as an example, in a live Dremio ELT session, Claude Code worked through a multi-step pipeline targeting retail database tables. It first validated source tables by connecting through Dremio to retail data in PostgreSQL and confirming schema and row counts across customers, transactions, products, and stores. It then generated a CTAS query targeting the retail database table customer_purchase_profile, stored as Iceberg, using CTEs to join and aggregate across the validated sources.
This kind of work used to take hours of trial, error, and documentation reading. With Claude Code connected to a Dremio environment, a data engineer can describe the transformation they need in plain language and get production-ready SQL back in seconds, then iterate on it conversationally until it is right. For teams building medallion architectures (Bronze, Silver, Gold), Claude Code is particularly effective: it understands the layering pattern and can generate the full chain of transformations from raw to curated while incorporating business logic described in natural language.
Governance Built Into the Pipeline
ELT is only as useful as the trust teams can place in the data it produces. Dremio addresses this at the platform level rather than leaving it to individual pipeline authors.
Column-level lineage traces how data moves and transforms across the entire lakehouse, from source table to Gold layer. When a downstream metric breaks, teams can follow the lineage back to the root cause. Apache Iceberg's native time travel and snapshot history give point-in-time queryability and rollback capability out of the box: if a transformation runs incorrectly and corrupts a table, querying the previous snapshot and restoring from it is a standard operation. Automatic table statistics keep query plans optimized and surface data health signals before downstream consumers notice a problem. Versioning and CI/CD support lets engineering teams apply software development practices to data pipelines, with branching, pull requests, and automated deployment for SQL transformations and catalog changes. For teams running observability platforms like Monte Carlo or Great Expectations, Dremio integrates with rather than replaces those tools.
A Lakehouse That Works Across the Entire Data Estate
What makes Dremio's approach to ELT different is the scope of what it governs. Most platforms manage data that lives within their own environment. Dremio federates across the entire data estate, connecting to a 60+ sources and treating all of them as first-class citizens in a single catalog, regardless of where the data physically lives. Teams do not have to migrate everything into the lakehouse to get the benefits of unified governance, query optimization, and lineage. They can govern what is in Iceberg and what remains in source systems from the same place, with the same access controls, and the same lineage graph.
This is the core of the Dremio vision: enable business from data to action. Accelerating the ability to get to trusted insights, faster business outcomes and faster decisions. Dremio is the only data platform built for agents, managed by agents, running on open standards like Apache Iceberg and Apache Arrow so that nothing is locked to a single vendor.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.