Dremio Blog

34 minute read · June 9, 2026

Semantic Layer Governance: Control What AI Agents Access

Alex Merced Alex Merced Head of DevRel, Dremio
Start For Free
Semantic Layer Governance: Control What AI Agents Access
Copied to clipboard

AI agents can execute hundreds of queries per minute, with no human reviewing each result before the agent acts on it. That is the governance gap that most data architecture teams have not yet closed. Traditional access controls were designed for a world where a person ran a report, read the output, and made a decision. When an agent does the querying, the analyzing, and the acting in milliseconds, procedural governance breaks down completely. Semantic layer governance AI is the architectural pattern that closes this gap by enforcing data access controls structurally, at the layer every query must pass through, rather than procedurally, in workflows that agents simply skip.

This post covers how to build that governance architecture in practice: using the semantic layer as the single enforcement perimeter, implementing row-level security and column masking that AI agents cannot bypass, applying least privilege principles to agent identities, and logging every query for compliance. All examples use Dremio's fine-grained access control (FGAC) capabilities, which enforce these policies inside the query engine itself.

Why AI Agents Break Traditional Governance Models

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Human-in-the-Loop That No Longer Exists

Traditional BI governance worked because a human was always in the middle. An analyst ran a query, reviewed the results, spotted the anomaly or the sensitive column that shouldn't be there, and decided what to do with it. The access control layer was important, but slow feedback loops meant mistakes were often caught before they became incidents.

An AI agent operating as part of an Agentic Lakehouse architecture does not have that review step. It queries, interprets, and acts in a single automated chain. A data governance agentic analytics strategy built around "someone will review the agent's output" is not a strategy, it is wishful thinking.

What an Agent Can Do Without Structural Controls

Without governance enforced at the data access layer, an agent explores whatever it can access. If a marketing analytics agent has SELECT permissions on an entire database, it WILL issue queries against financial tables, customer PII tables, and HR records if those are within its SQL generation scope. This is not malicious behavior; it is what agents do when they try to answer questions without knowing where the boundaries are.

The result: a query like "which customers have the highest churn risk" could return raw email addresses, phone numbers, or account numbers that the agent was never intended to see. If the agent then logs that output, summarizes it, or passes it to another system, you have a data breach with no obvious entry point.

The Semantic Layer as the Governance Perimeter

The core architectural insight behind semantic layer governance is this: if every query from every consumer (AI agents, BI tools, analysts, notebooks) must go through the same set of governed views, then policies defined once at the semantic layer apply everywhere automatically.

To understand what a semantic layer is and why it serves this role, the Dremio semantic layer overview covers the foundation. The short version: a semantic layer is a logical abstraction that defines named, governed virtual datasets over your physical data. Queries hit the virtual datasets, not the raw tables directly.

One Enforcement Point for All Consumers

Without a semantic layer, governance must be replicated across every tool that touches data. Your BI platform has its own permission model. Your notebook environment has another. Your custom API has yet another. Each is a separate enforcement point that can drift out of sync, have misconfigured permissions, or be bypassed entirely by a direct database connection.

With a semantic layer as the governance perimeter, you define row-level security, column masking, and access controls once on the Gold-layer virtual datasets. Every consumer that queries through those datasets inherits all policies automatically. A new AI agent gets connected, authenticated, and immediately operates within the same governance constraints as every other consumer. No additional configuration required.

Medallion Architecture and Agent Access

In a typical medallion lakehouse architecture:

  • Bronze layer contains raw ingested data exactly as it arrived. No AI agent should ever directly query this layer.
  • Silver layer contains cleaned, deduplicated data. Domain team access for validation and transformation work.
  • Gold layer contains governed, business-ready views designed for consumption. This is where all AI agent queries land.

The semantic layer sits at the Gold layer. All governance policies are defined here. When a new agent is provisioned, it gets access to a specific subset of Gold views and nothing else. The Bronze and Silver layers are not reachable by that agent. This is enforced through role-based access control on the folder structure of the semantic layer.

For a complete walkthrough of implementing this architecture, the complete semantic layer guide covers the design patterns in detail.

Row-Level Security AI: Filtering What Agents Can See

Row-level security (RLS) restricts which rows a given identity can see when querying a dataset. From the agent's perspective, it appears to be querying all available data. In reality, Dremio's query engine has silently injected a filter predicate that limits the result set to only the rows that identity is authorized to see.

How UDF-Based Row Filters Work in Dremio

Dremio implements row-level security through user-defined functions (UDFs) attached to views as row access policies. The UDF takes a column value as input and returns a boolean: true if the current identity can see that row, false if not. Dremio injects this function as a WHERE clause predicate during query planning, before any data is read from storage.

The critical detail: this happens inside the Dremio query engine. The agent cannot see or modify the predicate. It cannot craft a SQL query that bypasses it. The filter is applied before data returns to the query layer.

-- Row-level security: AI agent only sees its authorized segment
CREATE FUNCTION row_filter_by_region(region VARCHAR)
RETURNS BOOLEAN
RETURN IS_MEMBER('global_access')
    OR region = SESSION_USER_ATTRIBUTE('authorized_region');

-- Assign the row filter to the Gold view
ALTER VIEW semantic.application.app_customer_analytics
  SET ROW ACCESS POLICY enforce_region_access(region);

With this policy in place, a regional sales agent authenticated with authorized_region = 'EMEA' will only receive rows where region = 'EMEA', regardless of what SQL it submits. An agent attempting SELECT * FROM app_customer_analytics WHERE region IN ('EMEA', 'APAC') will only see EMEA rows. The APAC rows are filtered at the engine level before the query even executes against storage.

Real-World RLS Scenarios for AI Agent Governance

Row-level security enables several important data governance agentic analytics patterns:

  • Territory-scoped sales agents: each regional agent operates only on its assigned geography's pipeline data
  • Multi-tenant SaaS platforms: each customer's AI agent is restricted to that customer's records; cross-tenant data access is structurally impossible
  • Healthcare AI compliance: a clinical decision support agent only accesses patient records for the facility it is authorized to serve
  • Financial data compartmentalization: a trading desk agent cannot see positions belonging to other trading desks, preventing information barriers from being crossed

The key characteristic of all these scenarios is that the restriction is invisible to the agent and unbypassable by the agent. The agent does not need to know about the restriction; governance simply works.

Column Masking: Blocking Raw PII at the Engine Level

Row-level security controls which records an agent can see. Column masking controls what values those records contain when they arrive at the agent. For sensitive columns (email addresses, Social Security numbers, credit card numbers, phone numbers), the raw value never leaves the Dremio query engine for unauthorized identities.

How Column Masking Works in Dremio

Column masking uses UDFs applied directly in the Gold-layer view definition. The UDF receives the raw column value and returns either the raw value (for privileged roles like data_steward) or a masked/redacted version (for everyone else). Because the masking UDF is embedded in the view definition, every query against that view automatically applies the masking logic.

-- Column masking: email shown as hash to non-privileged roles
CREATE FUNCTION mask_email(email VARCHAR)
RETURNS VARCHAR
RETURN CASE 
  WHEN IS_MEMBER('data_steward') THEN email
  ELSE CONCAT(LEFT(email, 2), '***@***.', RIGHT(email, 3))
END;

-- Apply masking in Gold layer view
CREATE OR REPLACE VIEW semantic.application.app_customer_analytics AS
SELECT 
  customer_id,
  customer_tier,
  region,
  mask_email(email) AS email,  -- masked for non-stewards
  lifetime_value_usd
FROM semantic.business.biz_customers;

An AI agent assigned the ai_marketing_agent role (not data_steward) will see jo***@***.com in the email column. It can count the number of Gmail domains. It can analyze email domain distribution by customer tier. It cannot reconstruct the original email address, because the original value was never returned by the engine.

What AI Agents Can and Cannot Do with Masked Data

This distinction matters enormously for compliance. Masked data does not mean unusable data; it means appropriately scoped access.

What the agent can do:

  • Compute churn rates by customer tier
  • Segment customers by region and lifetime value
  • Count PII field completeness (what percentage of customers have an email on file)
  • Analyze behavioral patterns across the customer base

What the agent cannot do:

  • Retrieve "John Smith's email address"
  • Export a list of emails for use in another system
  • Find customers with a specific email domain through pattern matching on raw values
  • Aggregate email addresses into a list that could be used for contact purposes

The line is between aggregate analysis and individual PII retrieval. Column masking enforces that line structurally, not procedurally.

Application-Level vs. Engine-Level Enforcement

This is the distinction that determines whether governance is actually reliable. Application-level enforcement means the application code checks permissions and decides what to display. This can be bypassed by direct database connections, crafted API requests, or bugs in the application logic.

Engine-level enforcement means the Dremio query engine applies masking before any data is returned, regardless of how the query arrived. There is no code path in which the application can retrieve unmasked data for an unauthorized identity. The engine does not expose an override mechanism. This is the only form of governance you can rely on for AI agent access, where the "application" is a language model generating arbitrary SQL.

Least Privilege for AI Agents

Least privilege is a foundational security principle: each entity receives the minimum access it needs to perform its function and nothing more. NIST defines it as a security architecture principle ensuring each entity is granted only the resources and authorizations it needs for its work.

For AI agents, this principle has specific implementation requirements that differ from human user access management.

Why Each Agent Needs Its Own Identity

Shared service accounts are the most common mistake in AI agent governance. When multiple agents share a single Dremio service account:

  • A breach of one agent's credentials exposes all agent access simultaneously
  • Audit logs cannot distinguish which agent issued which query
  • Revoking access for one agent requires revoking it for all agents sharing that account
  • Permission creep becomes impossible to track because you cannot tell which agent needed which access

Each AI agent should have its own OAuth client credentials or Dremio service account. That identity is assigned exactly one Dremio role. The role grants SELECT on exactly the Gold-layer views that agent needs.

Mapping Agents to Roles to Views

A practical least privilege mapping for a data team running three AI agents:

AgentDremio RoleAuthorized Gold Views
Marketing Agentai_marketing_roleapp_customer_analytics, app_campaign_performance, app_web_traffic
Finance Agentai_finance_roleapp_revenue_summary, app_cost_centers, app_budget_tracking
Operations Agentai_ops_roleapp_inventory_levels, app_fulfillment_rates, app_supplier_metrics

The Finance agent cannot query app_customer_analytics. The Marketing agent cannot query app_revenue_summary. This is enforced at the RBAC layer in Dremio, not by trusting the agent to stay in its lane.

-- Grant AI agent service account access to Gold views only
GRANT SELECT ON FOLDER semantic.application.marketing TO ROLE ai_marketing_agent;

-- The finance and operations folders are not granted
-- Bronze and Silver layers have no grant at all
REVOKE ALL ON FOLDER semantic.bronze FROM ROLE ai_marketing_agent;
REVOKE ALL ON FOLDER semantic.business FROM ROLE ai_marketing_agent;

Agent Roles Are Not Set-and-Forget

Least privilege degrades over time without active management. An agent provisioned for a specific campaign last quarter may still have its role active after the campaign ends. Agents whose task scope expands gradually accumulate access grants that were never fully reviewed. Treat agent roles with the same governance rigor as employee access:

  • Quarterly access recertification: every agent role is reviewed and confirmed or revoked
  • Offboarding workflow: when an agent is decommissioned, its service account is disabled immediately
  • Expansion requests: adding access to new Gold views requires an approval workflow, not a casual GRANT command

Dremio FGAC: AI Agent Data Access Control in Practice

Dremio's fine-grained access control (FGAC) is the implementation layer where all of these governance policies are enforced. FGAC combines RBAC on virtual datasets, UDF-based column masking, and ROW ACCESS POLICY attachments into a unified enforcement model that operates inside the Dremio query engine.

The FGAC Query Execution Sequence

When an AI agent submits a query to Dremio (whether through JDBC, Arrow Flight, or the MCP Server), the following happens in order:

  1. Authentication: Dremio validates the identity (OAuth token, Personal Access Token, or service account credentials)
  2. RBAC check: Dremio verifies that the identity's role has SELECT permission on the target view
  3. RLS predicate injection: If the view has a ROW ACCESS POLICY, Dremio injects the filter predicate into the query plan
  4. Column masking resolution: UDFs in the SELECT list are resolved based on the current identity's role membership
  5. Execution: The query executes against physical data (Iceberg tables, Parquet files, other sources) with the injected filters and masking applied
  6. Result return: The filtered, masked result set is returned to the agent

Steps 3 through 5 happen entirely inside the Dremio engine. The agent receives the result as if it were the complete, unfiltered data for that query. It has no visibility into the filtering and masking that occurred.

Engine-Level Means No Bypass Path

This is worth being explicit about because it is the property that makes Dremio's FGAC actually reliable for AI agent governance. The agent cannot:

  • Submit a query to the underlying Iceberg table directly (RBAC blocks access to Bronze/Silver layers)
  • Construct SQL that references the masking UDF to reverse-engineer the original value
  • Issue a second query that retrieves data from a different path to the same underlying table
  • Use metadata queries to infer masked values (column masking applies even in metadata contexts)

If the agent's assigned role cannot see a row, that row does not exist from the agent's perspective. If the agent's role cannot see a raw column value, the masked value is the only value the agent will ever receive.

Audit Logging: Every Query an AI Agent Makes

Audit logging for AI agents is not optional if you operate under any compliance framework. GDPR, HIPAA, SOX, and CCPA all require demonstrating that AI systems accessing personal or financial data do so in accordance with authorization policies.

What Dremio Logs Per Query

Dremio maintains query logs that capture:

  • Full query text: the exact SQL submitted by the agent, including any automatically generated SQL from the MCP interface
  • Identity: which service account or OAuth client issued the query
  • Timestamp: precise execution time
  • Row count returned: how many records were in the result set
  • Source datasets accessed: which physical tables or views were touched
  • Execution duration: how long the query ran

This combination allows a compliance team to answer: "Did our marketing AI agent ever access the financial revenue table?" with a definitive yes or no, backed by a full audit trail.

Using Logs for Governance Reviews

Audit logs are only useful if you analyze them. For AI agent governance, periodic log analysis should answer:

  • Scope review: which datasets did each agent access in the last 30 days? Does that match the agent's expected task scope?
  • Anomaly detection: is any agent querying significantly more rows than its historical baseline? That could indicate a prompt injection attack or misconfigured agent behavior.
  • Access drift detection: is an agent starting to query views outside its normal set? Early detection prevents gradual privilege escalation.
  • Compliance reporting: generate reports showing that PII was only accessed by authorized identities during the reporting period.

Logs should be retained for the full compliance period required by your applicable frameworks. SOX requires seven years. HIPAA requires six years. Build log retention into your Dremio deployment configuration from the start.

Agentic Access via MCP and OAuth

The Model Context Protocol (MCP) is an open standard for connecting AI clients to data tools. Dremio's MCP Server exposes SQL query capability through this standardized interface, allowing external AI clients like Claude, custom agent frameworks, or enterprise AI platforms to query Dremio data with full governance applied.

How MCP Connects External AI to Governed Data

When an external AI client connects to Dremio's MCP Server:

  1. The client authenticates using OAuth 2.0 client credentials (standard for machine-to-machine auth)
  2. The OAuth token is issued for a specific Dremio identity (service account)
  3. That identity has an assigned Dremio role
  4. All FGAC policies attached to that role apply automatically to every query the agent submits through MCP

The AI client has no knowledge that row-level security and column masking are being applied. It submits SQL (or relies on Dremio's MCP tools to generate SQL from natural language), and receives results that have already been filtered and masked by the FGAC engine.

The Governance Guarantee for External AI

This architecture provides a critical guarantee: the governance boundary stays at Dremio regardless of which AI model is driving the queries. Whether the agent is using GPT-4, Claude, Gemini, or an open-source model, the governance policies are applied by Dremio's engine, not by the AI model's decision-making.

This matters because you cannot trust an AI model to self-enforce data access policies. The model does not know your data classification requirements, your regional data residency rules, or your HIPAA obligations. Dremio does, because those policies are programmatically configured in the FGAC system.

For the full picture of how Dremio's Agentic Lakehouse positions the semantic layer as the governance layer for all AI data access, including federated queries across Iceberg tables, databases, and object storage, the Dremio Agentic Lakehouse page covers the complete architecture.

The Semantic Layer as a Compliance Enforcement Point

Beyond access control, the semantic layer can enforce specific compliance requirements that are difficult to implement at the application layer.

Data Residency Enforcement

Data residency requirements mandate that data about EU residents cannot be processed by systems outside the EU, or that certain data cannot cross national borders. The semantic layer enforces this through row-level security policies on Gold views.

An EU-region AI agent authenticated with authorized_region = 'EU' can only see rows where data_residency_region = 'EU', even if it submits a query with no WHERE clause. The residency filter is injected by the FGAC engine. Cross-regional data access is structurally prevented, not just discouraged.

Retention Policies in the Semantic Layer

Regulatory frameworks like GDPR require that personal data be deleted or anonymized after a specified retention period. The semantic layer can enforce this through UDF-based masking that applies to records older than the retention window.

A UDF that checks record age and returns aggregate-only data (or NULL for detail columns) for records beyond the retention period means AI agents cannot retrieve individual-level data for records that should be anonymized, even if the underlying physical data hasn't been deleted yet. This creates a compliance-enforced view of the data that matches your retention policy.

PII Handling: Analysis Allowed, Retrieval Blocked

The critical distinction in AI agent PII governance is between aggregate analysis and individual retrieval. A compliant AI system should be able to:

  • Analyze patterns across the customer base without retrieving individual PII
  • Answer "what percentage of customers are in each email domain?" without exposing individual email addresses
  • Compute behavioral metrics that use PII columns as inputs without returning the raw PII

Column masking in the semantic layer achieves this. The agent can compute COUNT(DISTINCT email) and get an accurate count, because masking does not prevent aggregate computation. It prevents the agent from receiving a result set containing the raw email addresses themselves.

Governance Tradeoffs You Need to Know

Governance that is too restrictive creates its own problems. An AI agent that cannot see enough data to perform its intended function will either fail silently, generate hallucinated responses based on insufficient data, or produce outputs that appear plausible but are statistically invalid because they are based on a severely filtered dataset.

Finding the Right Level of Restriction

Row-level security that is too narrow creates agents that consistently see near-zero result sets. If an EMEA sales agent's region filter is misconfigured and no records match, the agent will report that "no sales data is available" rather than flagging a governance configuration error. This is a particularly insidious failure mode because the agent appears to work but produces nothing useful.

Column masking that obscures computation columns (not just display columns) can break analytical queries. If lifetime_value_usd is masked to NULL for non-privileged roles, an AI agent trying to compute LTV distributions cannot do its job. Reserve masking for genuinely sensitive display values (PII, financial identifiers) rather than metric columns the agent needs for computation.

Governance as Code

The most maintainable approach to semantic layer governance is treating all policies as versioned code artifacts. UDF definitions, view definitions, RBAC grants, and ROW ACCESS POLICY configurations should live in a version-controlled repository, deployed through a CI/CD pipeline.

This approach provides:

  • Audit trail: every policy change is a git commit with a message explaining why
  • Rollback capability: misconfigured policies can be reverted in minutes
  • Consistency: the same policies deploy identically to dev, staging, and production
  • Peer review: policy changes go through code review before deployment

Designing a Governance-First AI Semantic Layer

Building governance into the semantic layer from the start is far easier than retrofitting it onto an existing layer with existing agent access. Here is the design sequence:

Step 1: Classify your data. Before creating any Gold views, identify which columns in your Silver layer contain PII, regulated financial data, or sensitive business information. Assign a sensitivity classification to each column (Public, Internal, Confidential, Restricted).

Step 2: Design Gold-layer views with governance in mind. For each AI agent use case, define which Gold views the agent needs. Design the views to expose only the columns needed for that use case. Avoid catch-all views that expose entire tables.

Step 3: Define masking UDFs before creating views. Write masking functions for every Confidential and Restricted column that will appear in any Gold view. Define these UDFs before the views that use them.

Step 4: Implement RLS policies before granting agent access. Attach ROW ACCESS POLICY definitions to all Gold views before any agent service account is created. Governance must exist before access is granted.

Step 5: Create agent-specific roles. For each AI agent, create a dedicated Dremio role with SELECT permissions scoped to only that agent's Gold views.

Step 6: Configure OAuth service accounts per agent. Issue separate OAuth client credentials for each agent. Map each client to its dedicated role. Do not reuse credentials across agents.

Step 7: Test with adversarial queries. Before putting agents in production, test your governance configuration with queries designed to bypass it. Attempt to access Bronze tables directly, retrieve unmasked PII, or access data outside the RLS boundary. Verify all attempts fail.

Step 8: Enable audit logging with compliance-period retention. Configure Dremio's query logging to retain logs for the duration required by your compliance frameworks. For SOX environments, that is seven years.

Step 9: Schedule quarterly access reviews. Put a recurring calendar event for a governance review of all AI agent roles, permissions, and access patterns. Make revocation the default action when in doubt.

Step 10: Treat governance as code. Move all governance configurations (UDFs, view definitions, RBAC grants, row access policies) into version control. Treat policy changes with the same review process as application code changes.

What Comes Next

The governance architecture described here is not a one-time configuration. It is an ongoing practice that requires the same engineering discipline as the data pipelines and AI systems it governs.

As AI agents become more capable and more autonomous, the governance perimeter becomes the most consequential architectural decision your data team makes. An agent operating inside a well-governed semantic layer can be given significant latitude to explore data, generate insights, and trigger actions, because the structural controls prevent it from doing harm. An agent operating without that governance perimeter is a liability, regardless of how sophisticated its underlying model is.

If you are designing an AI-ready data platform today, start with the Gold-layer view design and data classification. Get governance in place before the first agent connects. Retroactively adding column masking to a Gold view that an agent has already been querying for three months is an incident waiting to happen, because you do not know what the agent retrieved during those three months.

The semantic layer is not just where business logic lives. For AI agents, it is where trust is built and enforced.

Start building your governed AI data platform with Dremio Cloud free for 30 days and explore how Dremio's FGAC, MCP Server, and Autonomous Reflections work together to give AI agents fast, safe, governed access to your lakehouse data.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.