The BIRD benchmark, the leading academic evaluation for text-to-SQL systems, shows that even the best large language models achieve only 60-70% accuracy on complex SQL queries against realistic database schemas. On simple, single-table lookups, accuracy approaches 90%. On multi-join, multi-condition queries that require real business context, it falls off a cliff.
That gap is the starting point for understanding text-to-SQL vs agentic analytics. Text-to-SQL is a useful, well-defined tool. It translates a natural language question into a SQL query. For the right use cases, it works well. But when your questions get harder, your schemas get messier, or your users expect more than a SQL string, text-to-SQL hits structural limits that no amount of prompt engineering will fix.
This post is written for data leaders who have already deployed text-to-SQL in some form and are now asking: what does it actually take to go further? What are the text-to-SQL limitations that matter, and what does agentic analytics require to address them?
What Text-to-SQL Actually Does (And Does Well)
Text-to-SQL is a single-step translation. A user types a question in natural language. The LLM receives the question along with schema context (table names, column names, sometimes sample data) injected into its prompt. The model outputs a SQL query. That query is either copied by the user or auto-executed by a BI tool.
That's it. No memory of prior questions. No interpretation of results. No planning. No action.
For the right questions, this works extremely well. "How many orders did we ship last week?" "What is the total revenue by region for Q1?" "Show me the top 10 customers by order value." These are well-scoped, single-dimension questions on stable, clean schemas. Text-to-SQL handles them reliably, especially when the schema is small and the column names are descriptive.
Most enterprise BI copilots, database natural language interfaces, and SQL assistant tools are text-to-SQL systems. They are useful. The problem is that users rarely stop at "how many orders shipped last week" for very long.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Where Text-to-SQL Hits the Wall
The limitations of text-to-SQL are not about the LLM being bad. They are architectural. Text-to-SQL was designed to do one thing: translate. When your questions demand more than translation, the architecture breaks down.
Complex Multi-Step Questions
Consider this question: "Which products caused the increase in churn last quarter?"
Answering it correctly requires:
Querying Q3 churn data, joined to product and subscription tables
Querying Q2 churn data for the same product groupings
Calculating the delta between Q3 and Q2 churn counts per product
Computing each product's contribution to the overall churn increase
Ranking by contribution
A single SQL query cannot do all of this in one shot and return a meaningful answer. A text-to-SQL system might generate a query that returns churn counts by product for Q3, but it won't compare them to Q2, won't compute deltas, and won't explain which products drove the change. The output looks plausible. The analysis is incomplete.
This isn't a solvable problem within the text-to-SQL architecture. Translation produces SQL. SQL is not a multi-step analysis plan.
Schema Complexity
Production schemas are rarely clean. They have hundreds of tables. Column names carry legacy abbreviations from systems built a decade ago. Some columns are deprecated. Some tables are raw sources; others are pre-aggregated marts. Different teams call the same concept by different names.
Text-to-SQL injects schema context into an LLM prompt. The context window has limits. When your schema has hundreds of tables, the LLM can't see all of them. It picks based on what fits in context, and it picks wrong. It joins the wrong tables, uses the wrong version of a metric, or pulls from a deprecated source without knowing it.
The BIRD benchmark highlights "external knowledge" as a primary difficulty factor. Business domain knowledge, specifically what your company means by "churn," "ARPU," or "active customer," is not stored in a schema definition file. Text-to-SQL cannot look it up. It guesses based on column names and training data.
No Context Retention
Text-to-SQL is stateless. Each question is answered from scratch, with no memory of what was asked before.
Ask "What was revenue in Q3?" Then follow up with "Compare that to Q2." The second query fails or produces nonsense because the model doesn't know what "that" refers to. The prior question is gone. The AI treats the follow-up as a standalone request with no prior context.
This breaks the way analysts actually work. Analytical conversations are iterative. You ask one question, look at the results, then ask a more specific question based on what you saw. Text-to-SQL forces you to re-state the full context of every follow-up manually.
No Action Capability
The output of text-to-SQL is a SQL string. Not a result. Not an analysis. Not a chart. Not a recommendation.
The user must copy the SQL, execute it in their tool, read the output, interpret what it means, decide if it answered the question, build a visualization, and figure out what to ask next. Text-to-SQL assists with one step of a multi-step analytical process. It does not replace the process.
Unpredictable Query Performance
Text-to-SQL generates a different SQL query every time a question is phrased differently. The same business question asked three different ways can produce three different query plans, with three different performance profiles.
Traditional performance tuning strategies, including indexes, materialized views, and manually written cached queries, depend on knowing what queries will be run. Text-to-SQL makes that impossible. You can't pre-optimize for queries you didn't know were coming.
What Agentic Analytics Actually Is
Agentic analytics is not a smarter chatbot. It is an AI system that operates in a loop: perceive a question, build a plan, select and execute tools, observe results, reflect, and respond with an interpretation.
The critical difference from text-to-SQL is tool use combined with multi-step reasoning. An agentic system doesn't just translate your question into SQL. It plans how to answer it, executes multiple queries in sequence, reads the results, and generates an analysis. Then it suggests what you might want to know next.
The architecture looks like this:
Question received: User asks "Which products caused the increase in churn last quarter?"
Planning: The agent breaks the question into sub-tasks: "I need Q3 churn by product, Q2 churn by product, delta calculation, and contribution ranking."
Tool use: The agent executes SQL queries (or calls other tools) for each sub-task.
Observation: The agent reads the query results.
Reflection: The agent checks whether the results answered the sub-questions.
Response: The agent synthesizes an answer in plain language, generates a visualization, and suggests follow-up questions.
The agent also retains context within a session. When you follow up with "Break that down by customer segment," the agent knows which products you were asking about. You don't need to re-state the question.
For a deeper look at why agentic analytics is architecturally distinct from text-to-SQL, the foundational argument is laid out in this post on agentic analytics.
The Same Question, Two Approaches
Here is what answering "Which products caused the increase in churn last quarter?" looks like in practice under each approach.
Text-to-SQL Response
The LLM receives the question plus schema context. It generates:
SELECT
p.product_name,
COUNT(c.customer_id) AS churn_count
FROM churn c
JOIN products p ON c.product_id = p.id
WHERE c.churn_date >= '2025-07-01'
AND c.churn_date < '2025-10-01'
GROUP BY p.product_name
ORDER BY churn_count DESC;
This is syntactically valid SQL. It returns churn counts by product for Q3. But it has not answered the question. It does not compare to Q2. It does not calculate which products drove the churn increase. A product that already had high churn in Q2 might show high Q3 churn without having "caused the increase." The answer is incomplete, but it looks finished.
The user must now: run this query, notice it only covers Q3, write a second query for Q2 themselves, calculate deltas, decide how to weight contribution, and build a visualization. Text-to-SQL did step 1 of 6.
Agentic Analytics Response
Step 1 (Plan): "I'll need churn by product for both Q3 and Q2, then calculate the delta and each product's contribution to the total increase."
Step 2 (Query 1): Execute Q3 churn by product.
SELECT product_id, COUNT(customer_id) AS q3_churn
FROM churn
WHERE churn_date >= '2025-07-01' AND churn_date < '2025-10-01'
GROUP BY product_id;
Step 3 (Query 2): Execute Q2 churn by product.
SELECT product_id, COUNT(customer_id) AS q2_churn
FROM churn
WHERE churn_date >= '2025-04-01' AND churn_date < '2025-07-01'
GROUP BY product_id;
Step 4 (Calculate): Join results, compute delta and contribution percentage.
Step 5 (Interpret): "Product X saw a 34% increase in churn from Q2 to Q3, accounting for 61% of the total churn increase across all products. This coincides with the pricing change introduced in August. Product Y was flat. Products A and B both declined."
Step 6 (Visualize): Bar chart comparing Q2 vs Q3 churn by product, with delta annotations.
Step 7 (Follow-up suggestion): "Would you like to see this broken down by customer segment, or compare the churn cohort for Product X against the broader subscriber base?"
The difference is not subtle. One approach generates a SQL string. The other generates an analysis.
Text-to-SQL vs Agentic Analytics: Comparison Table
Dimension
Text-to-SQL
Agentic Analytics
Workflow
Single-step (question to SQL)
Multi-step (plan, query, interpret, act)
Context retention
None (stateless)
Yes (session memory)
Complex questions
Limited
Yes
Output
SQL string
Analysis, visualization, recommendations
Action triggering
No
Yes
Schema requirement
Simple, clean schemas
Works with semantic layer on complex schemas
Performance
Depends on manual tuning
Autonomous optimization required
Governance complexity
Low
High (must be structural)
Each row in this table points to a structural difference, not a configuration difference. You cannot turn text-to-SQL into agentic analytics by improving the prompt.
Infrastructure Requirements for the Upgrade
Moving from text-to-SQL to agentic analytics is not a model swap. It is an infrastructure change. The model is one component. The infrastructure around it determines whether the agent can reason correctly, perform at scale, and operate within governance boundaries.
A Semantic Layer Is Non-Negotiable
Text-to-SQL can get lucky on a five-table schema with descriptive column names. Agentic analytics on production data cannot rely on luck.
Agents make multiple decisions per session: which tables to query, how to join them, which metric definition to use, and which columns represent the business concept the user is asking about. If those decisions are wrong at step 2, the result at step 6 is wrong, and the user may not catch it.
A governed semantic layer solves this. It provides business-friendly names for tables and columns, pre-defined and validated metric calculations, enforced join logic, and access control that the agent respects. The agent doesn't have to guess what "churn" means. The semantic layer tells it.
Without a semantic layer, agentic analytics on complex schemas produces confident wrong answers at scale. The semantic layer post explains the full concept. In Dremio, virtual datasets and governed views provide this layer, with metadata that AI agents can read and use as grounding context.
Federated Query Access
Your data is not in one place. Agents need to join customer data from your CRM, transaction data from your data lakehouse, product metadata from an operational database, and behavioral data from event logs. They need to do this without you ETL-ing everything into a single destination first.
Federation is what allows an agent to issue a single analytical query that spans multiple source systems and return a unified result. Without it, you're forced to pre-materialize every possible join in a warehouse before the agent can touch it. That approach is both slow and expensive, and it assumes you know in advance which joins the agent will need.
Dremio's query federation layer allows agents to query data where it lives: S3, cloud data warehouses, relational databases, and streaming sources. The agent doesn't know or care about the physical location. The semantic layer provides the logical view.
Autonomous Performance Optimization
This is the infrastructure requirement that most teams underestimate. Agents generate novel query patterns that no DBA anticipated. Different phrasings of the same question produce different query plans. An agent analyzing "which products caused churn" might generate three different query patterns in the same session depending on how it breaks down the plan.
Manual performance tuning cannot keep up. You can't pre-create indexes or materialized views for queries you didn't know were coming.
This is where Dremio's Reflections and Autonomous Reflections address a real problem. Reflections are materialized acceleration datasets that Dremio maintains automatically. Autonomous Reflections go further: they analyze actual query patterns and adjust materialization strategies over time, without manual intervention. The C3 (Coordinator Cost Cache) caches planning metadata to accelerate query planning for novel patterns.
For agentic analytics to be practical at scale, autonomous performance optimization is required infrastructure, not a nice-to-have. The autonomous performance post covers how this works in detail.
Structural Governance
With text-to-SQL, governance is relatively simple. The SQL goes to a human first. The human decides whether to run it. If the query is wrong or unauthorized, the human catches it before execution.
With agentic analytics, the agent runs queries autonomously. There is no human in the loop between "question asked" and "query executed." Governance must therefore be enforced at the platform level, not through human review.
This means row-level and column-level security that the agent cannot bypass, regardless of how it phrases a query. Dremio's Fine-Grained Access Control (FGAC) enforces this structurally. The semantic layer adds another layer: it constrains which tables and metrics the agent can access. If a business user's semantic layer view doesn't include raw PII columns, the agent cannot expose them, even if it tries to.
Audit logs also become more important, not less. When an agent produces an analysis, you need to be able to reconstruct exactly which queries it ran, in what order, against which data. Without that trail, debugging incorrect analyses or investigating data access incidents is extremely difficult.
Agentic Interfaces That Connect to Your Agents
Agentic analytics doesn't have one interface. Different users need different access patterns:
External AI agents (Claude, GPT, custom agents) connect via MCP (Model Context Protocol), a standardized protocol that exposes Dremio's semantic layer and query execution to any compatible agent framework. If you're new to MCP, the MCP beginner guide covers the basics.
Business users interact through an AI Agent UI that wraps the agentic analytics capability in a conversation interface without requiring SQL knowledge or agent configuration.
Data scientists and custom agent frameworks access Dremio via Arrow Flight, a high-performance columnar data transfer protocol optimized for Python-based analytical workflows.
Each interface serves a different user type. The underlying platform (semantic layer, federation, governance, performance) is the same.
How Dremio Enables the Upgrade
Dremio's platform is designed around five pillars that map directly to the infrastructure requirements above.
Query Federation enables agents to join data across source systems without ETL. Agents can query S3, cloud data warehouses, relational databases, and other sources through a unified semantic layer. No pre-materialization of every possible join is required.
Autonomous Performance addresses the unpredictable query pattern problem. Reflections and Autonomous Reflections maintain and adjust materialized acceleration datasets based on actual usage. C3 caches planning metadata. The engine optimizes for the queries it sees, not the queries a DBA anticipated.
AI Semantic Layer provides the governed business context that agents need to reason correctly. Virtual datasets, metric definitions, join logic, and column-level metadata are all readable by AI agents. The semantic layer is the contract between the agent's reasoning and the actual data.
Agentic Interfaces connect the platform to any agent framework or user type. The MCP Server exposes the semantic layer to external AI agents using the Model Context Protocol. The AI Agent UI provides a business user interface. Arrow Flight provides high-throughput data access for Python-based agents.
AI SQL Functions bring AI capabilities inside the query engine itself, enabling semantic search, classification, and enrichment to run directly within SQL queries against your data.
Together, these five pillars form what Dremio calls the Agentic Lakehouse: a data platform built for agents to operate against production data accurately, at scale, and within governance boundaries.
The Honest Tradeoff
Agentic analytics is more capable than text-to-SQL. It is also harder to govern and harder to debug.
A wrong step in a multi-step reasoning chain propagates silently. If the agent uses the wrong table in sub-query 1, every subsequent step builds on that wrong foundation. The final answer looks polished. The analysis is incorrect. And because the user sees a complete, confident response instead of a raw SQL query, they're less likely to scrutinize it.
This is not a reason to avoid agentic analytics. It is a reason to build the infrastructure correctly before deploying it. Specifically:
Semantic layer completeness matters more as agents get more autonomous. Every undefined metric is a hallucination opportunity. Every ambiguous column name is a wrong-join waiting to happen. The semantic layer must be comprehensive, governed, and accurate.
Audit logs must capture the full reasoning chain. Not just the final SQL. Every sub-query, every intermediate result, every branching decision. Without this, you cannot investigate a wrong analysis after the fact.
Constraint design is a governance responsibility. Deciding what the agent can and cannot access is not a technical afterthought. It is a data governance decision that must be deliberate.
None of these requirements are unique to Dremio. They apply to any agentic analytics deployment on any platform. The question is whether your platform makes them achievable or leaves them as manual problems.
When to Stay with Text-to-SQL
Not every use case needs agentic analytics. For some questions and some teams, text-to-SQL is the right answer.
Text-to-SQL is a good fit when:
Questions are simple and well-scoped ("How many orders shipped last week?")
Schemas are small, stable, and have descriptive column names
Users need query-generation assistance, not full analysis
The team doesn't have a semantic layer or federated data infrastructure in place
Analytical questions rarely require multi-step reasoning or period comparisons
Deploying an agentic analytics system for simple point-lookup queries adds complexity without adding value. The agent loop, the multi-step planning, the tool orchestration: all of that overhead is unnecessary when the question has a direct SQL answer.
Text-to-SQL and agentic analytics are not in competition. They are appropriate for different points on the spectrum of analytical complexity.
The Path Forward
The practical path from text-to-SQL to agentic analytics has a clear first step: build your semantic layer first, and deploy text-to-SQL on top of it.
Text-to-SQL on a governed semantic layer is significantly more reliable than text-to-SQL on a raw production schema. The semantic layer constrains what the model can access, provides business-friendly terminology, and enforces metric definitions. The accuracy improvement is material.
Once you have text-to-SQL working well on your semantic layer, you can observe where it breaks. The questions that text-to-SQL answers poorly, which involve multi-step reasoning, period comparisons, segment analysis, or cross-source joins, form a natural roadmap for which agentic capabilities to add first.
Add federation when agents need to join across sources you haven't consolidated. Add autonomous performance optimization when query volume grows and manual tuning becomes a bottleneck. Add agentic interfaces as different user types need access. Expand the semantic layer as the agent encounters new domains.
The foundation is the same for both text-to-SQL and agentic analytics. Agents just need more of it, and they need it to be complete.
The teams that will get the most out of agentic analytics in the next few years are not the ones rushing to deploy the newest model. They are the ones who built the semantic layer, the governance structure, and the performance infrastructure first. Text-to-SQL is a good starting point. It is not a destination.
Try Dremio Cloud free for 30 days and see how the Agentic Lakehouse platform enables the full upgrade path, from text-to-SQL on a governed semantic layer to fully autonomous, multi-step agentic analytics. Get started here.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.