Dremio Blog

14 minute read · May 12, 2026

Using Claude Code to Build an Iceberg Lakehouse

Mark Shainman Mark Shainman Principal Product Marketing Manager
Start For Free
Using Claude Code to Build an Iceberg Lakehouse
Copied to clipboard

Using Claude Code to Build an Iceberg Lakehouse

For years, building a production-grade data lakehouse required a specialized team: data engineers to design pipelines and to tune queries, and platform architects to manage table maintenance. Apache Iceberg changed the storage and table format equation, giving teams an open, vendor-neutral foundation for any scale of data. What remained hard was everything around it: the tooling, the setup, the SQL, and the operational discipline to keep it running. Dremio with Claude Code changes that. In under an hour, with nothing more than a few plain-English prompts, any data practitioner can build an Iceberg-native lakehouse on Dremio Cloud, complete with a medallion architecture, governed views, business-ready analytics, and query acceleration.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Gaps That AI-Driven Data Engineering Closes

Building a lakehouse has never been the hard part conceptually. The friction is everywhere else.

Data engineers routinely spend hours writing ingestion code, only to discover that the schema assumptions they made about a CSV don't match what arrived. Every new dataset requires manual inspection, type inference, and a round of trial-and-error DDL before a single business question can be asked. For teams without dedicated engineering resources, this gap is often fatal to a data initiative before it starts.

SQL expertise is a gatekeeping problem at every layer of the stack. Creating Silver views that join multiple sources and calculate derived metrics, or Gold views that aggregate risk scores across geopolitical dimensions, requires fluency in join logic, window functions, and aggregation semantics. Analysts who understand the business question can't always translate it into the query. Engineers who can write the query don't always understand the business context. The round-trips between them eat days.

Claude Code, paired with Dremio's Iceberg-native platform, addresses all these problems directly. In this walkthrough, we'll build a working Iceberg lakehouse from scratch using supply chain data for rare earth minerals, then ask it real business questions and get answers back in plain English. 

Before You Start

Download and install Claude Desktop.

You’ll need Claude Desktop to run Claude Code locally. Grab it at claude.com/download and follow the installer for your operating system.

Make sure Python 3.11 or higher is installed.

Claude Code’s Dremio CLI requires Python 3.11+. Check your version by running this in your terminal:

Mac / Linux:

python3 –version

Windows:

python –version

If the version returned is below 3.11, or Python isn’t installed yet, download the latest 3.11+ installer at python.org/downloads and run it. On Windows, make sure to check “Add Python to PATH” during installation.

Building the Lakehouse: Step by Step

Spin up a Dremio Cloud environment in under three minutes.

Head to dremio.com/get-started and sign up for a free trial. No credit card required, $400 in credits included, and you’ll have a live Dremio Cloud environment ready to query in under three minutes.

Connecting Claude Code to Dremio takes under two minutes.

Start by installing the Dremio CLI:

Mac:

pip3 install dremio-cli

Windows:

pip install dremio-cli

Open Claude Code and ask: "Help me connect to my Dremio Cloud account." Claude will walk you through the authentication setup. You'll need your Project ID and a Personal Access Token (PAT) from Dremio Cloud.

You can find the Project ID in the URL bar – it is the long number after “project”

To generate a personal access token simply click on your avatar/initials with your name on the lower left-hand side bar of the Dremio Cloud console.

Click on “account settings”, and then “personal access tokens” and generate a token. 

When you asked Claude to help you connect to Dremio, it will generate the connection command for you that you will use in terminal for the secure transfer of credentials. Your PAT is a credential that grants full access to your Dremio environment. Paste it only into your terminal, never into a chat window where it could be logged or inadvertently shared

dremio profile delete default && dremio profile create default \  --type cloud \  --base-url https://api.dremio.cloud \  --project-id YOUR_PROJECT_ID \  --auth-type pat \  --token YOUR_PAT_HERE

After you have provided your Dremio credentials, simply ask Claude to confirm that the connection is live. Ask Claude to "Run a test query against Dremio to confirm my connection is working,"   

Building a Medallion Architecture from a Single Prompt.

With a connection established, creating a structured Bronze/Silver/Gold lakehouse requires no SQL. Just describe what you want in Claude Code or Cowork. 

"Create a space called StrategicMaterialsDB with Bronze, Silver, Gold folders."

Then load your source data. Download the three sample CSV files and place them in your local workshop folder.

https://github.com/markshainman-max/Claude-Workshop-Assets/tree/main

Then tell Claude:

"I have three CSV files in my workshop folder: mineral_sources.csv, geopolitical_risk_index.csv, and supplier_audit_findings.csv. Create Iceberg tables for each in the Bronze folder and load all the data."

Claude leverages Dremio which infers schemas, generates the DDL, creates the Iceberg tables, and loads the data. No manual type mapping. No trial-and-error DDL. The tables are immediately queryable by any Iceberg Catalog REST -compatible engine.

Silver and Gold views translate business logic into governed assets.

  Write the following in Claude Code, and it will create the business layer for both knowledge workers as well as AI Agents. 

"Create a Silver view called ShipmentRisk that joins mineral sources to the risk index and calculates weighted risk exposure for each shipment."

"Create a Gold view called VulnerabilityDashboard that shows each mineral type's total volume, average risk score, high-risk volume, and percentage dependence on high-risk countries."

The resulting views are registered in Dremio's catalog, versioned, and immediately available to downstream tools (BI platforms, notebooks, or AI agents) without any additional configuration.

Business questions get immediate, plain-English answers.

With the lakehouse built, the conversation shifts from engineering to analysis:

"Which minerals are most exposed to geopolitical risk? Give me a business summary."
"Which suppliers should we be most worried about and why?"
"If China restricted exports tomorrow, what percentage of our Neodymium supply would be at risk?"

Claude translates each question into a query, runs it against Dremio, and returns a narrative summary alongside the results. No SQL required on the analyst's side.

Lineage is a prompt away.

"Show me the lineage for the Gold VulnerabilityDashboard view. What are all its upstream dependencies?"

Dremio's native lineage tracks every dependency: from the Gold view back through the Silver transformations to the raw Bronze Iceberg tables. Audit, compliance, and impact analysis are built in from day one.

Autonomous Query Acceleration

Reflections eliminate the gap between data volume and query speed.

Dremio's Reflections are pre-materialized aggregations that the query engine routes traffic to automatically. With Claude Code, creating one is as simple as describing the workload:

"Create an aggregation reflection on the Customer360 orders table to pre-materialize it."

The Customer360 dataset (177 million orders across 4.8 million customers) is a real test of query performance. A membership-tier aggregation run without a reflection takes measurable time at that scale. Once the reflection builds and you run the same query again, the query completes in a fraction of the time. Dremio routed it to the pre-materialized reflection, applying columnar optimizations and partition pruning at the reflection layer rather than scanning the full table.

Dremio's autonomous reflection management goes further: it monitors query patterns and recommends acceleration strategies without engineering intervention. With Auto Reflections enabled, it creates them automatically based on query patterns and workloads. No jobs to design, no maintenance windows to schedule.

The Full Stack of Dremio's Agentic Lakehouse Advantage

Open Catalog keeps every asset interoperable and durable.

Every table, view, and reflection created through Claude Code is registered in Dremio's Open Catalog, built on Apache Polaris. Any Iceberg catalog REST-compatible engine (Spark, Flink, Trino, or tools that don't yet exist) can read the data without migration or conversion. The catalog investment is as durable as the format itself.

Autonomous table management keeps Iceberg tables fast at any scale.

Dremio handles compaction, clustering, and vacuuming on a continuous schedule. As the StrategicMaterialsDB Bronze tables grow with new shipment data, they stay organized and query-ready without manual intervention. Teams focus on analysis, not maintenance.

The Intelligent Query Engine is built to exploit Iceberg's architecture.

Partition pruning, file-level skipping, and predicate pushdown aren't add-ons. They're native to how Dremio plans and executes queries against Iceberg tables. When Claude Code routes a business question to Dremio, it lands on a query engine purpose-built to answer it fast.

AI agents are becoming primary consumers of enterprise data, and the demands they place on the data layer are different from traditional workloads: higher frequency, lower tolerance for latency, and no human in the loop to catch a stale result. The Agentic Lakehouse is purpose-built for that shift: a platform where data is always fresh, always governed, and always fast, without the operational overhead that has historically made that combination impossible. Claude Code makes the Agentic Lakehouse accessible to every team, not just those with dedicated platform engineers.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.