Financial services is the industry where a wrong answer from an AI agent doesn't just produce a bad dashboard. It produces a regulatory violation. That single fact changes every architectural decision you make about agentic analytics in banking, insurance, and capital markets.
The data is maximally sensitive: Social Security numbers, account numbers, primary account numbers (PANs) under PCI DSS, trading positions that could constitute material non-public information, and health records for insurance portfolios. The regulation is maximally dense: Basel IV, FRTB, IFRS 9, Solvency II, DORA, GDPR, PCI DSS, GLBA, SOX, and MiFID II all have direct implications for how AI systems access and process financial data. And the stakes are maximally high: a data breach triggers a 72-hour GDPR notification window, a capital calculation error triggers a regulatory enquiry, and an unauthorized data access event can carry individual liability for the CISO and CRO.
None of that means agentic analytics financial services teams cannot build. It means they need to build differently.
Why Financial Services Is the Hardest Environment for Agentic Analytics
Every industry has sensitive data. Financial services has uniquely regulated sensitive data, where the sensitivity, the regulation, and the consequence of failure all operate at the same elevated level simultaneously.
A healthcare data breach is serious. A financial data breach affecting PANs triggers PCI DSS mandatory notification, potentially invalidates PCI compliance certification, and opens the firm to card brand fines that run into the millions per incident. A market risk calculation error doesn't just produce a wrong number. If that number gets submitted to a regulator as part of a COREP or FRTB capital return, the firm may need to restate, explain, and potentially hold additional capital against the correction.
The density of regulation is also unmatched. A mid-size European bank in 2026 operates simultaneously under Basel IV for capital, FRTB for trading book risk capital, IFRS 9 for loan loss provisioning, DORA for operational and ICT resilience, GDPR for customer data, PCI DSS for payment processing, and MiFID II for trade reporting. Each framework has data access implications. Many of them explicitly require audit trails, data lineage, and controls over who (and what) can access regulated data.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
The Regulatory Landscape for AI in Finance
Regulation
Jurisdiction
What It Governs
Key AI Implication
Basel IV
Global (BIS)
Bank capital adequacy
Capital calculations must be auditable and traceable
DORA
EU (effective Jan 2025)
ICT operational resilience
AI systems in critical functions need documented controls
GDPR
EU
Personal data processing
Data minimization; cross-border restrictions; purpose limitation
PCI DSS v4.0
Global
Cardholder data security
PANs must be masked except last 4 digits for non-authorized access
SOX
US
Financial reporting controls
AI agents in reporting chains require documented controls
GLBA
US
Consumer financial information
Financial institutions must protect customer financial data
IFRS 9
International
Expected credit loss provisioning
ECL model inputs must be auditable
Solvency II
EU
Insurance capital requirements
Risk calculations must be reproducible and auditable
DORA is worth particular attention for teams building agentic analytics. It came into effect in January 2025 and establishes requirements for ICT risk management frameworks, incident classification and reporting, and operational resilience testing. An AI agent that touches core banking data or participates in a reporting workflow is almost certainly in scope for DORA's ICT risk management requirements. That means the agent's access scope, controls, and failure modes need to be documented as part of the ICT risk framework.
The Opportunity Is Too Big to Ignore
Given all of that, why are financial services firms adopting agentic analytics at all? Because the alternative, continuing to run regulatory reporting, fraud detection, and risk analysis through entirely manual processes, carries its own very large costs.
Basel IV and FRTB reporting cycles consume thousands of analyst hours per quarter. Experienced risk analysts with the expertise to interpret regulatory requirements spend most of their time doing data extraction, joining spreadsheets, and running manual aggregations. McKinsey estimates that 20-30% of compliance staff time in financial institutions is consumed by data gathering and assembly activities rather than actual analysis.
Fraud losses exceeded $485 billion globally in 2023, according to the Nasdaq Global Financial Crime Report. Traditional fraud detection creates 24-to-48-hour review queue backlogs for human investigators, during which fraudulent accounts continue operating. Pattern detection at the speed required to meaningfully reduce fraud losses needs agent-driven, automated analysis.
Credit risk scenario analysis for regulatory stress tests (DFAST in the US, EBA stress tests in Europe) requires running hundreds of economic scenarios against entire loan portfolios. Doing that manually takes weeks. With an agent that can run scenarios continuously, risk teams shift from running one analysis per quarter to running continuous monitoring.
If agentic automation can handle 20% of the compliance and risk analyst workload, the math is significant. A bank with 500 FTEs in risk and compliance functions (loaded cost of $150,000 each) is looking at $15 million per year in freed analyst capacity. The agents don't replace the analysts. They free analysts to focus on interpretation, exception investigation, and regulatory dialogue instead of data assembly.
The question for financial services data architects is not whether to pursue agentic analytics. It is whether to build the governance layer that makes it possible to do it safely. You can read more about why many agentic analytics approaches fail, and what it takes to get them right, in this analysis of the common failure patterns.
Three Non-Negotiable Requirements for Regulated Agentic Analytics
Before any AI agent connects to financial data, three capabilities must be in place. These are not "nice to have" features. They are minimum requirements for operating in a regulated financial environment.
1. Complete Audit Trails
Every agent query must be logged with: the agent's identity (the specific agent role and the OAuth token that authenticated it, not just a service account name), the exact query text, the timestamp of execution, the specific tables or views accessed, and the metadata of the result (row count, bytes returned). The actual data returned should not be logged, but everything about the access event must be.
This requirement flows directly from multiple frameworks. SOX Section 302 requires management to certify the effectiveness of internal controls over financial reporting. If an AI agent participates in producing financial reports, the controls over that agent are part of the SOX control framework. DORA Article 9 requires a comprehensive ICT risk management framework that documents data flows and access controls. Basel's BCBS 239 principles for risk data aggregation require that risk data be auditable with full data lineage.
The audit trail is not just a compliance checkbox. It is the proof that your governance worked when a regulator or internal auditor asks "show me every data access event related to this capital calculation."
2. PII and Sensitive Data Protection at the Engine Layer
Column-level masking must happen inside the data engine, not at the application layer. This distinction is critical when the application layer is an AI agent.
An AI agent that receives raw data and is instructed to mask it before displaying results is not sufficient for a regulated environment. The agent could malfunction. The agent's prompt could be manipulated. The agent could inadvertently include raw PAN data in its reasoning trace or log output. The only reliable protection is masking applied before the data ever reaches the agent, at the query engine level, so the masked value is what gets returned from the database engine itself.
PCI DSS v4.0 is explicit: Primary Account Numbers must display only the last four digits for any system or user that is not specifically authorized to see the full number. For an AI fraud detection agent, that means receiving ****1234, never 4111111111111234.
GDPR's data minimization principle applies equally. An AI agent running regulatory capital calculations does not need customer names or addresses. It needs exposure amounts, asset classes, and risk weights. Design the agent's data scope to the minimum necessary for its specific function.
3. Data Residency and Lineage
For financial institutions operating under GDPR, cross-border data transfer restrictions are not theoretical. An AI agent running on a cloud platform in the US must not query EU personal data unless an adequate transfer mechanism is in place. And "the query didn't move the data" is not an adequate answer when the query result containing EU personal data was processed on a US server.
Row-level security enforced by policy (not by a WHERE clause in the agent's query) is the correct mechanism. The agent's role determines which rows it can see, independently of how the agent writes its queries. Even if an agent writes SELECT * FROM customer_risk_view, the row security policy filters the result to the agent's authorized jurisdiction before the data is returned.
Data lineage, separately, is required for any agent output that feeds into a regulatory submission. You must be able to trace the agent's output back through the transformation chain to the source data, including the specific transformations and business logic applied.
How Dremio Enables Safe Agentic Analytics in Financial Services
Dremio's Agentic Lakehouse architecture addresses each of these requirements at the platform layer, so the governance controls are consistent regardless of which AI model connects.
Fine-Grained Access Control: Column Masking and Row Security
Dremio's Fine-Grained Access Control (FGAC) applies column masking rules at query execution time, inside the Dremio engine, before results are returned. When the Basel reporting agent queries the customer risk view, the masking rule for account_number fires and returns ****1234 instead of the full number. The agent never receives the full number. There is no opportunity for it to accidentally expose that data downstream.
Row-level security works through policy predicates that evaluate against the querying identity's role. A jurisdiction row security predicate means that a compliance agent with the emea_compliance_analyst role can only retrieve rows where jurisdiction = 'EU', regardless of how the agent's SQL is written. This protects against both accidental cross-jurisdiction queries and adversarial prompt injection attempts that might try to expand the query scope.
The role-based differentiation is also valuable for human oversight. A data steward with the data_steward role can retrieve the full account number through the same view. The view's masking logic applies different outputs to different roles, making the semantic layer the single source of governance truth.
Audit Logging and Identity
Every query that Dremio executes generates an audit log entry containing: the authenticated user or service account identity, the query text, start and end timestamps, the specific tables and views accessed, and execution metadata including bytes scanned and rows returned.
When AI agents authenticate via Dremio's MCP Server using OAuth 2.0, the OAuth token carries the agent's identity. The audit log shows basel_reporting_agent as the querying identity, not a generic service account. This specificity is critical for DORA ICT risk framework documentation and for SOX control evidence.
Dremio Cloud audit logs are immutable once written, which satisfies the tamper-evidence requirements of SOX and aligns with DORA's requirements for ICT incident records.
MCP Server with OAuth for External AI Models
The Model Context Protocol (MCP) integration is the mechanism through which external AI models, whether that's Claude, GPT-4, or a privately deployed LLM on the bank's own infrastructure, connect to Dremio's governed data environment.
The AI model authenticates to Dremio's MCP Server using OAuth 2.0. The OAuth token is short-lived (typically 1-hour expiry) and scoped to a specific Dremio role. Once authenticated, all of Dremio's FGAC policies for that role apply automatically to every query the agent runs. The AI model receives a Dremio role, not database credentials. It never has direct access to source databases.
This architecture means the governance layer is independent of the AI model. You can swap the AI model (from one LLM to another, or update to a new model version) without changing your governance configuration. The controls live in Dremio, not in the AI model's prompt.
The Semantic Layer as the Control Point
The semantic layer is where financial business logic gets encoded once and reused across every agent query. Exposure at Default (EAD), Risk-Weighted Assets (RWA), and Expected Credit Loss (ECL) calculations are defined in the semantic layer as governed views. These definitions took years of regulatory interpretation and internal alignment to standardize, and they live there permanently.
When an AI agent queries semantic.business.biz_regulatory_exposures, it accesses the pre-validated, pre-governed version of that data, with masking applied, with row security enforced, and with the correct Basel IV business logic already encoded. The agent doesn't write raw SQL against normalized source tables and risk computing a capital figure using incorrect asset class mappings. The semantic layer is both a governance control and a guard against model-generated calculation errors.
Use Case 1: Regulatory Reporting Automation (Basel IV Capital Adequacy)
The manual version of Basel IV capital adequacy reporting at a large bank typically takes 3 to 5 days per reporting cycle. Risk analysts extract data from core banking systems, the trade repository, and the collateral management system. They join datasets across systems in Excel or a risk warehouse staging layer. They apply RWA weights by asset class, aggregate totals, and build the COREP submission package. Multiple review rounds follow.
Most of that time is spent on data assembly, not analysis. The actual judgment calls, whether a specific exposure should be categorized differently or whether the model overlay is appropriate, take a fraction of the total time.
An agent-assisted workflow changes the assembly phase. The Basel reporting agent queries the semantic layer in Dremio, which already has the asset class mappings, RWA weight tables, and exposure calculation logic encoded as governed views. The agent runs the capital calculation query, aggregates by asset class, and generates a structured summary. A human risk officer reviews the output, validates totals against control figures from the risk system, and approves for submission.
The SQL the agent works with against the semantic layer looks like this:
-- Example Basel IV capital calculation query via agent
SELECT
asset_class,
SUM(exposure_at_default_usd) AS total_ead,
SUM(exposure_at_default_usd * risk_weight) AS risk_weighted_assets,
SUM(exposure_at_default_usd * risk_weight) * 0.08 AS minimum_capital_requirement
FROM semantic.business.biz_regulatory_exposures
WHERE reporting_date = CURRENT_DATE - INTERVAL '1' DAY
GROUP BY asset_class
ORDER BY risk_weighted_assets DESC;
Every time this query runs through an agent, Dremio logs the agent identity, the query text, the tables accessed, and the timestamp. The governance controls active at query time: the agent's basel_reporting role means all FGAC masking and row security applies. No raw account-level data is exposed. The result is aggregated capital figures at the asset class level, exactly what the reporting workflow needs and nothing more.
Estimated time savings: 80% reduction in analyst preparation time for data assembly. The analysts' time moves entirely to review, exception analysis, and regulatory narrative.
Use Case 2: Real-Time Fraud Detection Analysis
Fraud operations teams at large banks deal with flagging systems that generate thousands of alerts per day. The challenge is not detection sensitivity. Modern ML models flag suspicious transactions effectively. The challenge is triage speed. Human investigators face review queues measured in hours or days, during which fraudulent accounts continue transacting.
An agent-assisted fraud triage workflow keeps human investigators focused on confirmed or high-probability cases. The fraud analysis agent monitors batches of flagged transactions, queries for pattern clusters (velocity anomalies, geographic clustering, device fingerprint matching, unusual merchant category sequences), and surfaces structured summaries to the investigation queue.
The PII protection design is intentional: the fraud analysis agent sees ****1234, not 4111111111111234. It sees the masked cardholder name, not the full name. It doesn't need raw PAN or full name to do pattern analysis. Velocity, geo-clustering, and merchant category sequences are the analytical inputs. The masked identifiers serve only as correlation keys.
When an investigator picks up a case, they authenticate with their fraud_investigator role, which has the privilege to retrieve unmasked data for confirmed investigation cases. The same Dremio view returns different data to the agent (fraud_analyst_bot role) and to the human investigator, with column masking providing the differentiation.
The result: agents handle initial triage and pattern synthesis. Human investigators handle unmasked data review, case decision-making, and SAR filing. Response times for high-confidence fraud clusters drop from hours to minutes.
Use Case 3: Credit Risk Scenario Analysis
IFRS 9 requires banks to estimate expected credit losses using forward-looking information, including macroeconomic scenarios. The practical implementation means running the loan portfolio through multiple economic scenarios (a baseline, an adverse scenario, and a severely adverse scenario) and computing ECL under each.
For DFAST (US stress testing) and EBA stress tests, the scenario count goes higher. A mid-size bank running a comprehensive internal stress test might run 20 to 30 macro scenarios across a loan book with millions of positions. Each scenario requires aggregating by product type, geography, obligor segment, and IFRS 9 stage classification.
The agent-assisted workflow: the risk team defines scenario parameters (rate paths, unemployment assumptions, GDP growth). The credit risk agent queries the IFRS 9 semantic layer in Dremio against the Iceberg loan portfolio tables, runs the scenario aggregations, and generates a sensitivity analysis table. A typical output reads: "ECL increases by $2.3 billion under the adverse scenario, driven primarily by Stage 2 migrations in the commercial real estate segment."
Apache Iceberg's time-travel capability provides an important advantage here. The loan portfolio as of any prior reporting date is instantly queryable without maintaining separate point-in-time snapshots. The agent can compare the current portfolio composition against the portfolio as of three months prior to show how risk concentrations have shifted.
The human risk manager reviews the scenario outputs, applies qualitative overlays (management judgment overlays are an IFRS 9 requirement), and approves for board reporting. The agent's contribution is the 90% of work that involves data retrieval and aggregation. The manager's contribution is the 10% that requires professional judgment and regulatory accountability.
The MCP Architecture for Regulated Data Environments
The Model Context Protocol pattern Dremio uses for financial services agentic analytics is specifically designed so that AI models never touch raw source data directly.
The flow for a regulatory agent query works as follows:
The AI model (Claude, an internal private LLM, or a specialized model) sends a query intent to Dremio's MCP Server.
The MCP Server authenticates the agent via OAuth 2.0. The token maps to a specific Dremio role, for example compliance_analyst.
The query routes to the appropriate semantic layer view, for example semantic.application.app_customer_risk.
Dremio's FGAC layer applies: column masking fires for account_number (returns ****1234), the jurisdiction row security policy filters to the agent's authorized scope, and column restrictions block fields the role is not authorized to access.
The query executes against the federated sources (which may include a core banking mainframe via JDBC, a risk warehouse, and Iceberg tables on S3) all without the AI model knowing the source topology.
An audit log entry is created atomically with the query execution: agent ID, timestamp, tables accessed, query text, bytes scanned.
The masked, filtered result returns to the AI model.
What the AI model never does in this architecture: it never authenticates directly to source databases. It never receives unmasked PII. It never bypasses the Dremio governance layer. It never executes an unlogged query.
For institutions with the strictest data residency requirements, the architecture supports private LLM deployment. The AI model runs on-premises or in the institution's own VPC. It connects to Dremio (also on-premises or in the institution's cloud) via the MCP Server. No data leaves the institution's network boundary, not even to a cloud AI provider's API. The governance controls are identical whether the AI model is Claude connecting over the public internet or a private LLaMA-3 instance running in the bank's data center.
The semantic layer view that the compliance agent queries for customer risk data uses this pattern:
-- Financial semantic layer: masked customer view for AI agents
CREATE OR REPLACE VIEW semantic.application.app_customer_risk AS
SELECT
customer_id,
CASE WHEN IS_MEMBER('data_steward') THEN account_number
ELSE CONCAT('****', RIGHT(account_number, 4)) END AS account_display,
credit_score_band, -- banded score, not raw score
risk_segment,
total_exposure_usd,
jurisdiction -- GDPR-relevant field for row security
FROM semantic.business.biz_customer_risk
WHERE row_access_policy_check(jurisdiction) = TRUE;
The IS_MEMBER('data_steward') check means a data steward reviewing this view in a SQL client sees the full account number. The compliance_analyst_bot agent role sees ****1234. One view, one governance configuration, differentiated output by role. That is the semantic layer serving as the control point.
The Federated Advantage for Banking Data
A fundamental challenge for agentic analytics in banking is that the data required for any significant analytical task is spread across multiple source systems, each with its own access controls, schema conventions, and technology stack.
A Basel IV capital calculation requires data from: the core banking system (loan balances, collateral values), the trade capture system (trading positions, notional amounts), the collateral management system (eligible collateral, haircuts), the credit risk system (internal ratings, PD/LGD estimates), and often a data lake for historical enrichment. In most large banks, those systems are on different platforms, including mainframe COBOL applications, Oracle Risk, Hadoop/Spark data lakes, and various vendor systems.
The traditional approach to creating a unified analytical layer over these sources is a data warehouse consolidation project: extract data from each source, transform it into a common schema, load it into the warehouse, and query the warehouse. That project takes 6 to 18 months, creates data duplication (increasing compliance scope and attack surface), and produces a warehouse that immediately starts diverging from source systems as data models evolve.
Dremio's federated query approach is the alternative. Dremio connects to each source via native connectors: JDBC for Oracle and Teradata, Arrow Flight for modern systems, and S3-compatible APIs for data lake sources. AI agents query a unified semantic layer that Dremio materializes across all sources at query time. The data never moves. Source systems retain their own access controls as a defense-in-depth layer, with Dremio's governance as the primary control for agent access.
For compliance scope, this matters: sensitive data (PANs, account numbers, customer PII) stays in the source systems that already have established compliance controls. Dremio mediates access rather than creating a new centralized copy that would need its own PCI and GDPR compliance framework.
Tradeoffs and What to Expect
Building production-grade agentic analytics in a financial institution is not a quick project. Setting up FGAC policies (defining column masking rules, row security predicates, role hierarchies, and view-level permissions) requires collaboration between data architects, compliance officers, security teams, and business domain experts. In a large institution with hundreds of data entities in scope, this is 3 to 6 months of work before any agent connects.
The principle of least privilege applies at the agent level, not just the user level. Each agent should have a role scoped specifically to its analytical domain. The credit_risk_agent role should have no access to customer marketing data. The fraud_detection_agent role should have no access to capital calculation views. Broad agent roles that can access everything are a security and compliance anti-pattern. When you define roles narrowly, a compromised agent token has limited blast radius.
Human review for regulatory outputs is a requirement in most jurisdictions, not an optional process step. DORA and Basel frameworks expect human accountability for regulatory submissions. IFRS 9 requires management judgment overlays. No financial regulator currently accepts "the AI agent produced this output" as a complete accounting. Build human-in-the-loop steps into every agentic workflow that produces output feeding into regulatory reporting, board reporting, or client communications.
LLM hallucination risk is real and specific in financial contexts. An LLM generating SQL against raw schema could apply incorrect asset class mappings, use wrong date filters, or join tables on incorrect keys. The semantic layer mitigates this significantly by pre-encoding correct financial logic, so agents query validated views rather than constructing financial calculations from scratch. But the mitigation is not perfect. Run agent outputs in parallel with manual outputs for at least 90 days before any automation replaces manual process steps.
Where to Start in a Regulated Institution
The entry point that works best in practice: pick an internal analytics use case (not a regulatory submission, not a client-facing output) and build the governance layer for that domain first.
Management risk reporting (internal, not COREP/FINREP submissions) is a good starting point. The governance requirements are meaningful but the consequence of an error is an internal correction rather than a regulatory restatement. Use that implementation to validate your FGAC configuration, test the audit log integration with your SIEM, and verify that the semantic layer produces correct outputs.
Once you've validated governance on the internal use case, expand one domain at a time: add the credit risk domain, validate ECL scenario outputs against manual calculations for 90 days, then move on. Each domain expansion adds governance surface area (new roles, new masking rules, new row security predicates) but the infrastructure and validation methodology carry over.
For DORA compliance: document the agent's role in your ICT risk framework from the start. Include: the agent's identity, its authentication mechanism, the data assets it accesses, the governance controls applied, and the failure mode documentation. DORA requires this documentation for any ICT system in a critical function. Starting the documentation at implementation time is far easier than reconstructing it for an audit later.
The institutions building this governance-first infrastructure now are creating a compounding advantage. As AI capabilities improve, a correctly built governance layer applies automatically to new models and new agent capabilities. The institutions that skip governance to move faster will face a rearchitecting project when regulators come asking for DORA control documentation or PCI DSS penetration test findings on AI data paths.
Agentic analytics in financial services is not a speculative future capability. The regulatory reporting, fraud detection, and credit risk analysis workflows described here are live implementation patterns at institutions that have done the governance work. The question is whether your institution builds that governance layer in 2026 or spends 2027 catching up.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.