Dremio Blog

8 minute read · June 24, 2026

Why AI Agents Need a CLI, Not Just an MCP Server

Will Martin Will Martin Technical Evangelist
Start For Free
Why AI Agents Need a CLI, Not Just an MCP Server
Copied to clipboard

Most conversations about AI and data platforms start with MCP. That's understandable: the Model Context Protocol has become the standard way to give AI agents a window into a data system, and Dremio's MCP server does this well. But MCP solves the specific problem of giving agents a supervised, conversational interface to your data. What it doesn't do is give agents a reliable way to operate your data platform when no one is watching.

That's a meaningful distinction. And it's the problem the Dremio Developer CLI is built to solve.

MCP Handles Discovery. Execution Is a Different Problem.

MCP is designed for the supervised session: an agent proposes an action, a human approves it, the tool runs, and the context window grows. That trust model works well for exploration. For example, a data analyst asks Claude to help them understand a schema, the agent calls the MCP server's GetSchemaOfTable tool, and the result feeds directly into the conversation. Nobody needs to worry about the agent running unchecked.

But pipelines don't work like that. Neither do overnight jobs, scheduled quality checks, or agent swarms running across multiple data sources simultaneously. Those workflows don't have a human approving each step. They run autonomously, often in the background, and they need an interface that's designed for that trust model: structured output, input validation that doesn't rely on human review to catch mistakes, and (most critically) minimal token overhead.

If you're doing unattended automation, MCP is the wrong tool for the job. While a benefit in supervised sessions, MCP's rich conversational context becomes unnecessary overhead in autonomous pipelines.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

What Autonomous Agents Actually Need From a Data Tool

An agent operating without human supervision has three requirements that look simple but are easy to get wrong.

First, it needs to discover what it can do at runtime, not from training data. An agent that has to rely on what it saw during pre-training to know how to construct a valid query is an agent that will eventually hallucinate a parameter name or a command syntax. The interface needs to be self-describing.

Second, it needs structured, predictable output. An agent parsing free-text terminal responses is fragile. JSON that matches a consistent schema is not.

Third, it needs to fail safely. When the agent constructs a command with a malformed path or an invalid filter, the tool should catch that before it reaches the API, not after. In a supervised session, a human catches those mistakes. In an autonomous pipeline, the tool itself has to.

What the Dremio CLI Provides

The Dremio Developer CLI addresses all three of these requirements directly.

Every command in the CLI is introspectable via dremio describe. Running dremio describe query.run or dremio describe reflection.list returns a full JSON schema for that command: parameter names, types, required and optional flags, and enum values. An agent doesn't read the documentation. It queries the CLI at runtime and constructs valid commands from the schema it gets back. There's no guesswork.

Output defaults to JSON across all commands. Rather than scraping a formatted terminal table, an agent gets clean structured data it can reason over directly. The --fields flag trims that output further: dremio job list --fields job_id,job_state returns only the two fields the agent needs, which matters in a multi-step session where context window size compounds quickly.

The CLI also validates inputs before they hit the API. Catalog paths are checked for traversal attempts. SQL-interpolated values like job IDs and state filters are validated before use. All API errors return a consistent {"error": "...", "status_code": N} format rather than raw HTTP traces. This is the safety layer that makes autonomous agent operation reliable, not a convenience for human developers (though we love to have it too!).

The scope is broad: 50+ operations across 13 command groups covering queries, schema inspection, catalogue management, Reflections, jobs, engines, users, roles, grants, projects, wiki documentation, tags, and full-text search.

Two Interfaces, Two Trust Models

The Dremio CLI doesn't replace the MCP server. They serve different modes of operation, and both are worth having.

MCP is the right interface for the conversational, supervised session. When an analyst or business user is working with an AI assistant to explore data, understand a schema, or debug a query, the rich context MCP provides is exactly what the interaction needs. The human is present and the overhead is worth it.

The CLI is the right interface for autonomous execution. When an agent is running a scheduled quality check, building a reflection, auditing access controls, or operating as part of a multi-agent pipeline, it needs structured JSON, self-describing commands, and a tool that validates inputs without human oversight.

Think of MCP as the interface for "help me understand this data" and the CLI as the interface for "do this on my behalf. But reliably. And at scale." Together they cover the full range of AI agent interactions with a data platform, from initial data exploration through to production-grade automation.

The CLI is open source under Apache 2.0. If you want to see what agent-ready data infrastructure looks like in practice, a free Dremio Cloud environment at dremio.com/get-started includes both the MCP server and the CLI out-of-the-box.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.