12 minute read · April 14, 2025

AI Agents for Dremio Utilizing MCP

Rahim Bhojani

Rahim Bhojani · CTO

Aniket Kulkarni

Aniket Kulkarni · Software Architect @ Dremio

Why SQL Must Evolve in the Era of Agentic Apps and Data-Aware AI

SQL has long been the universal language of data. But with the rise of Generative AI and agentic applications, a major shift is underway. We're entering an era where natural language is the interface, and agents are the client.

There are two major trends converging here—both fueled by GenAI and both reliant on data:

  1. Agents need data to do their work.
    Autonomous agents are being deployed to perform tasks like generating personalized marketing campaigns, running financial simulations, or triaging support tickets. To do these jobs effectively, they need access to company data—and many of them are now fluent in SQL. SQL is emerging as the preferred language for agents to retrieve and interact with structured data.
  2. Humans still prefer natural language over SQL.
    Despite years of SQL training and the proliferation of BI tools, many users—from analysts to marketers—struggle to write precise queries. They want to express what they need in plain English. Agents can help here too—acting as translators that convert natural language into executable SQL queries.

In both cases, agents need to interact directly with data systems like Dremio. But without a common protocol, every integration becomes a custom effort. Just as REST standardized how services communicate, we now need a standard for agent-data interaction.

That’s where MCP (Model Context Protocol) comes in.

MCP, developed by Anthropic and backed by a growing ecosystem including OpenAI, Microsoft, and now Dremio, is designed to standardize how agents interact with tools, systems, and data.

In simple terms, MCP lets agents:

  • Discover what capabilities are available (e.g., “query a dataset” or “get schema metadata”)
  • Understand how to use them (parameters, expected results, etc.)
  • Invoke them dynamically, in real time, as part of a reasoning process

This makes MCP the OpenAPI of the agentic world—except broader, semantically richer, and designed for intelligent systems.

How Do LLMs Work?

To understand why MCP matters, it helps to briefly understand how LLMs reason and interact with tools.

An LLM receives a “context”—a sequence of instructions, background knowledge, history, and available tools—and then determines the next best token to produce. In many agent frameworks, this context includes a list of tools that the model can invoke. These are represented as function signatures, like this:

{
  "function": "run_sql",
  "description": "Run a SQL query",
  "parameters": {
    "query": "string"
  }
}

When an LLM sees a user query like:

“How many customers signed up in California last month?”

…it doesn’t just generate a SQL query directly. It breaks the task into steps:

  1. Search for relevant tables or metadata
  2. Figure out which columns represent state and signup date
  3. Construct a valid SQL query
  4. Invoke the SQL tool with that query
  5. Interpret the results and respond to the user

This entire process is possible because the model understands the tools available and how to call them. And it’s exactly this context—this interface between tools and models—that MCP standardizes.

Without MCP: Why This Was Painful Before

Before MCP, building agentic experiences that interact with real systems was painful and repetitive. Here’s what it used to involve:

  • Every tool needed a bespoke interface.
    You had to manually define each tool’s parameters and hardcode function specs for your specific model/runtime.
  • No shared vocabulary.
    Each model had its own format for tool calls, making it hard to build once and reuse across providers.
  • Limited discoverability.
    Agents couldn’t explore available capabilities—they needed everything preloaded.
  • Hard to scale or compose tools.
    Chaining tasks (like querying Dremio, exporting to Sheets, and summarizing results) was error-prone and manual.

With MCP, this all changes.

Why Does This Matter for Dremio?

This is why we’re so excited about MCP. It standardizes the interface between LLMs and tools—removing the need for one-off integrations and enabling dynamic discovery and composition of capabilities.

Dremio is built around openness. We believe in an open lakehouse architecture where your data isn’t locked behind proprietary APIs—it’s accessible, queryable, and now, agent-ready.

LLMs have a data information retrieval problem because they cannot natively access, retrieve, or accurately interpret real-time, private, or structured data without external systems augmenting their capabilities.

MCP, combined with Dremio, addresses this challenge head-on. With Dremio’s rich metadata and semantic layer, and MCP’s standardization, agents gain native access to discover datasets, generate SQL queries, and return insights—securely and at scale.

Enter the Dremio MCP Server

We’re introducing the Dremio MCP Server—an open-source project that allows any AI agent using MCP to communicate directly with Dremio.

With this server, agents can:

  • Discover datasets, views, and metadata
  • Translate natural language into SQL queries
  • Explore your lakehouse with rich, contextual understanding

This isn’t just for “data analysts.” An agent might:

  • Help a marketer pull client segmentation data for campaign personalization
  • Assist a finance bot in compiling quarterly reporting numbers
  • Translate an executive’s question into a SQL query for sales performance

And all of this happens seamlessly—without needing the user to know SQL.

What’s Under the Hood?

Here’s how the Dremio MCP integration works:

  1. Tooling – Tools like RunSqlQuery, GetSchemaOfTable, and RunSemanticSearch are defined and registered with the MCP Server.
  2. Auto-Discovery – Agents use MCP metadata to understand available functions, parameters, and expected outputs.
  3. Invocation – Agents invoke tools directly and use the results to proceed with the next step in reasoning.

Because of MCP, agents can reason over which capability to use—no hardcoding required.

Try It, Build With It, Shape the Future

We’re open-sourcing the Dremio MCP Server to kickstart adoption and innovation:
👉 https://github.com/dremio/dremio-mcp

With this server, you can:

  • Let agents discover and query your data
  • Build natural-language interfaces to your lakehouse
  • Enable automated workflows powered by agents

Initial tools include:

  • RunSqlQuery – Execute SQL directly on your cluster
  • GetSchemaOfTable – Retrieve schema, descriptions, and tags
  • RunSemanticSearch – Let agents explore your metadata with LLM-powered search

This is just the beginning.

Developers Wanted

This is an open ecosystem—and your contributions matter. We welcome:

  • New capabilities (semantic layers, visualizations, data transformations)
  • Better dev tooling, monitoring, and logs
  • Real-world feedback on use cases and performance

Let’s shape the future of data-native AI together.

Final Thoughts

In the near future, natural language will be the API—and agents will be the clients. But to make that future real, we need open, expressive, and secure ways for those agents to interact with systems.

MCP offers that promise. And with the Dremio MCP Server, we’re helping make it real.

👉 Get started on GitHub

Sign up for AI Ready Data content

Achieve More with MCP: Accelerate Results with AI-Ready, Curated Datasets

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.