Dremio Blog

12 minute read · April 14, 2025

AI Agents for Dremio Utilizing MCP

Rahim Bhojani CTO

Aniket Kulkarni Software Architect @ Dremio

Start For Free

Copied to clipboard

AI Agents for Dremio Utilizing MCP

Why SQL Must Evolve in the Era of Agentic Apps and Data-Aware AI

Introducing MCP: The Missing Link for Agentic AI

How Do LLMs Work?

Without MCP: Why This Was Painful Before

Why Does This Matter for Dremio?

Enter the Dremio MCP Server

What’s Under the Hood?

Try It, Build With It, Shape the Future

Developers Wanted

Final Thoughts

Why SQL Must Evolve in the Era of Agentic Apps and Data-Aware AI

SQL has long been the universal language of data. But with the rise of Generative AI and agentic applications, a major shift is underway. We're entering an era where natural language is the interface, and agents are the client.

There are two major trends converging here—both fueled by GenAI and both reliant on data:

Agents need data to do their work.
Autonomous agents are being deployed to perform tasks like generating personalized marketing campaigns, running financial simulations, or triaging support tickets. To do these jobs effectively, they need access to company data—and many of them are now fluent in SQL. SQL is emerging as the preferred language for agents to retrieve and interact with structured data.
Humans still prefer natural language over SQL.
Despite years of SQL training and the proliferation of BI tools, many users—from analysts to marketers—struggle to write precise queries. They want to express what they need in plain English. Agents can help here too—acting as translators that convert natural language into executable SQL queries.

In both cases, agents need to interact directly with data systems like Dremio. But without a common protocol, every integration becomes a custom effort. Just as REST standardized how services communicate, we now need a standard for agent-data interaction.

That’s where MCP (Model Context Protocol) comes in.

Introducing MCP: The Missing Link for Agentic AI

MCP, developed by Anthropic and backed by a growing ecosystem including OpenAI, Microsoft, and now Dremio, is designed to standardize how agents interact with tools, systems, and data.

In simple terms, MCP lets agents:

Discover what capabilities are available (e.g., “query a dataset” or “get schema metadata”)
Understand how to use them (parameters, expected results, etc.)
Invoke them dynamically, in real time, as part of a reasoning process

This makes MCP the OpenAPI of the agentic world—except broader, semantically richer, and designed for intelligent systems.

How Do LLMs Work?

To understand why MCP matters, it helps to briefly understand how LLMs reason and interact with tools.

An LLM receives a “context”—a sequence of instructions, background knowledge, history, and available tools—and then determines the next best token to produce. In many agent frameworks, this context includes a list of tools that the model can invoke. These are represented as function signatures, like this:

{
  "function": "run_sql",
  "description": "Run a SQL query",
  "parameters": {
    "query": "string"
  }
}

When an LLM sees a user query like:

“How many customers signed up in California last month?”

…it doesn’t just generate a SQL query directly. It breaks the task into steps:

Search for relevant tables or metadata
Figure out which columns represent state and signup date
Construct a valid SQL query
Invoke the SQL tool with that query
Interpret the results and respond to the user

This entire process is possible because the model understands the tools available and how to call them. And it’s exactly this context—this interface between tools and models—that MCP standardizes.

Without MCP: Why This Was Painful Before

Before MCP, building agentic experiences that interact with real systems was painful and repetitive. Here’s what it used to involve:

Every tool needed a bespoke interface.
You had to manually define each tool’s parameters and hardcode function specs for your specific model/runtime.
No shared vocabulary.
Each model had its own format for tool calls, making it hard to build once and reuse across providers.
Limited discoverability.
Agents couldn’t explore available capabilities—they needed everything preloaded.
Hard to scale or compose tools.
Chaining tasks (like querying Dremio, exporting to Sheets, and summarizing results) was error-prone and manual.

With MCP, this all changes.

Why Does This Matter for Dremio?

This is why we’re so excited about MCP. It standardizes the interface between LLMs and tools—removing the need for one-off integrations and enabling dynamic discovery and composition of capabilities.

Dremio is built around openness. We believe in an open lakehouse architecture where your data isn’t locked behind proprietary APIs—it’s accessible, queryable, and now, agent-ready.

LLMs have a data information retrieval problem because they cannot natively access, retrieve, or accurately interpret real-time, private, or structured data without external systems augmenting their capabilities.

MCP, combined with Dremio, addresses this challenge head-on. With Dremio’s rich metadata and semantic layer, and MCP’s standardization, agents gain native access to discover datasets, generate SQL queries, and return insights—securely and at scale.

Enter the Dremio MCP Server

We’re introducing the Dremio MCP Server—an open-source project that allows any AI agent using MCP to communicate directly with Dremio.

With this server, agents can:

Discover datasets, views, and metadata
Translate natural language into SQL queries
Explore your lakehouse with rich, contextual understanding

This isn’t just for “data analysts.” An agent might:

Help a marketer pull client segmentation data for campaign personalization
Assist a finance bot in compiling quarterly reporting numbers
Translate an executive’s question into a SQL query for sales performance

And all of this happens seamlessly—without needing the user to know SQL.

What’s Under the Hood?

Here’s how the Dremio MCP integration works:

Tooling – Tools like RunSqlQuery, GetSchemaOfTable, and RunSemanticSearch are defined and registered with the MCP Server.
Auto-Discovery – Agents use MCP metadata to understand available functions, parameters, and expected outputs.
Invocation – Agents invoke tools directly and use the results to proceed with the next step in reasoning.

Because of MCP, agents can reason over which capability to use—no hardcoding required.

Try It, Build With It, Shape the Future

We’re open-sourcing the Dremio MCP Server to kickstart adoption and innovation:
👉 https://github.com/dremio/dremio-mcp

With this server, you can:

Let agents discover and query your data
Build natural-language interfaces to your lakehouse
Enable automated workflows powered by agents

Initial tools include:

RunSqlQuery – Execute SQL directly on your cluster
GetSchemaOfTable – Retrieve schema, descriptions, and tags
RunSemanticSearch – Let agents explore your metadata with LLM-powered search

This is just the beginning.

Developers Wanted

This is an open ecosystem—and your contributions matter. We welcome:

New capabilities (semantic layers, visualizations, data transformations)
Better dev tooling, monitoring, and logs
Real-world feedback on use cases and performance

Let’s shape the future of data-native AI together.

Final Thoughts

In the near future, natural language will be the API—and agents will be the clients. But to make that future real, we need open, expressive, and secure ways for those agents to interact with systems.

MCP offers that promise. And with the Dremio MCP Server, we’re helping make it real.

👉 Get started on GitHub

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Product Insights from the Dremio Blog

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.

Alex Merced

Oct 12, 2023 Product Insights from the Dremio Blog

Table-Driven Access Policies Using Subqueries

This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.

Albert Vernon

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

AI Agents for Dremio Utilizing MCP

Table of Contents

Why SQL Must Evolve in the Era of Agentic Apps and Data-Aware AI

Introducing MCP: The Missing Link for Agentic AI

How Do LLMs Work?

Without MCP: Why This Was Painful Before

Why Does This Matter for Dremio?

Enter the Dremio MCP Server

What’s Under the Hood?

Try It, Build With It, Shape the Future

Developers Wanted

Final Thoughts

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

Why SQL Must Evolve in the Era of Agentic Apps and Data-Aware AI

Introducing MCP: The Missing Link for Agentic AI

How Do LLMs Work?

Without MCP: Why This Was Painful Before

Why Does This Matter for Dremio?

Enter the Dremio MCP Server

What’s Under the Hood?

Try It, Build With It, Shape the Future

Developers Wanted

Final Thoughts

Try Dremio Cloud free for 30 days

Related Dremio Articles

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Table-Driven Access Policies Using Subqueries

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Ready to Get Started?