Dremio Blog

9 minute read · July 2, 2026

The Semantic Layer: From Human Shortcut to Agent Guardrail

Will Martin Will Martin Technical Evangelist
The Semantic Layer: From Human Shortcut to Agent Guardrail
Copied to clipboard

For most of its history, the semantic layer was considered a solved problem. You built it once, business users queried it wherever it lived, and (hopefully) everyone would agreed on what "revenue" meant. However, much like the information in your data dictionary, the popularity of the semantic layer went stale and businesses turned to new, more exciting concepts.

One such concept is AI Agents. Fresh on the scene, they are querying your data and are having the same troubles with the schemas as your human users did. Just like people, to effectively perform and generate accurate work they need context for the data.

Could the neglected semantic layer be the solution to this context problem? Yes and no. Turns out it takes more than just blowing off the dust and giving an LLM access. Read on to learn where the old semantic layer falters in the agentic-era and how it's been revamped for non-human users.

Semantic Layer 1.0: A Shared Dictionary for Human Analysts

The semantic layer concept emerged in the 1990s as an answer to a specific frustration: business users couldn't query raw databases directly. Annoying, but that made sense, as raw schemas are built for storage efficiency, not human readability. Column names like acct_rev_adj_usd_net_q2 mean something to the engineer who wrote them and nobody else.

The semantic layer solved this be laying down definitions for datasets and columns, to translate the engineer's technical structure into user-appropriate business language. It's use then expanded to also define refined datasets, such as Data Products, so consumers understood whatever data they were working with.

In practice, this typically meant one thing: a data dictionary. At its simplest a spreadsheet or document listing every table and column, what each one contains, its data type, acceptable values, and who owns it. There are also more sophisticated standalone tools (e.g. Collibra, Alation, Atlan) that store this information in a searchable, structured format with links between related assets.

The core idea was that your data creators write down what the data means once, and everyone else looks it up instead of guessing. The predicted outcome being that having open metric definitions would lead to them being standardised across an organisation. Meaning concepts, such as "churn rate", were calculated the same way everywhere.

For human users, this worked well because humans bring contextual judgement to data work. An analyst who sees five different revenue columns with slightly different names can pause and check the data dictionary. An AI agent doesn't do any of that.

What Breaks When Agents Query Schemas

An AI agent works quite differently to human analysts. It applies pattern matching to the available column names and produces SQL that looks syntactically correct regardless of whether it picks the right data. When asked "what was total revenue this quarter," an agent without semantic context picks transaction_amount because it's generically named, and returns a figure that includes refunds, test accounts, and pending settlements. The number looks plausible, but nobody knows it's wrong until someone audits it against the actuals.

True, a well-named quarterly_revenue_summary dataset can give the agent better signal than a raw table called fact_trans_v2. But the human-era layer was built assuming the consumer could read documentation, apply judgement, and ask a colleague for help. An AI agent that can't find what it's looking for doesn't ask questions. It guesses.

The Modern Semantic Layer: What Was Repurposed, What Was Added

The core semantic layer concept of documentation carried forward intact. But the implementation had to be completely rethought.

In the human-era, a semantic layer consisting of wikis and descriptions was useful for onboarding and compliance. However, these were rarely kept current with humans often resorting to tribal knowledge to fill in the gaps. For an AI agent, undocumented columns are opaque, leaving the agent with only the column name to reason from.

Modern semantic layers treat documentation as a first-class technical requirement, not an administrative chore. Labels and tags are still used but their role has expanded significantly from mere organisational tools. A taxonomy of labels (revenue_metric, cost_metric, pii, customer_metric) now functions as a navigation system for AI agents. Instead of searching across thousands of columns, an agent filters to revenue_metric-tagged columns first, then selects from a shortlist of well-documented candidates.

Additionally, there are two new capabilities with no meaningful precedent in the original, human-centred semantic layer.

  • Semantic search allows users and agents to find data assets using natural language rather than exact names. "Customer orders by product category" returns the right view without the agent needing to know it's stored as business.analytics.customer_order_summary. The match is conceptual, not lexical, bridging the gap between natural language questions and a schema built by engineers.
  • The knowledge graph connects related entities at a structural level: customers to orders, orders to products, products to categories. For human analysts, those relationships are obvious from context, but not for AI agents. The knowledge graph gives agents an explicit map of how the data estate connects, enabling more accurate multi-table queries without the agent having to infer join logic from column names alone.

Dremio's AI Semantic Layer

Dremio's implementation brings all of this together in a single platform, positioned specifically for agentic workloads. Virtual datasets (views) provide the business logic layer, with a recommended medallion architecture that progressively refines raw data into AI-queryable, well-documented views. Wikis attach to every entity in the catalog, all the way down to individual columns, and can be auto-generated by AI where human documentation is missing. Labels create the navigational taxonomy agents use to find the right data and semantic search lets agents query by concept rather than name.

Governance is enforced at query execution time, not the application layer. Row-level security and column masking apply regardless of what SQL an agent generates, meaning the semantic layer functions as both a discoverability tool and a hard access boundary. An agent can't return data its user isn't permitted to see, even if it writes SQL that attempts to.

Dremio is ranked the number one semantic layer vendor by Dresner Advisory Services, and the full AI Semantic Layer is available from day one in the free Dremio Cloud environment. Try it yourself at dremio.com/get-started.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.