Financial institutions deal with data spread across core banking systems, market data feeds, and compliance databases. A risk analyst checking whether an account shows suspicious activity needs data from all three. Building ETL pipelines to consolidate everything into one warehouse takes months and introduces data staleness that regulators won't accept.
This tutorial shows you how to build a financial analytics pipeline on Dremio Cloud in 30 minutes. You'll seed sample banking, market, and compliance data, model it into a medallion architecture, and use the AI Agent to detect transaction anomalies and assess account risk through natural language questions.
What You'll Build
Source tables simulating a core banking system (Oracle), market data store (S3), and compliance database (PostgreSQL)
Bronze views that standardize types and naming across all sources
Silver views that join account activity with compliance records
Gold views for account risk profiling and transaction anomaly detection
AI-generated wikis and tags for semantic context
Natural language financial analytics using Dremio's AI Agent
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Step 1: Sign Up for Dremio Cloud
Go to dremio.com/get-started and create a free trial account. You get 30 days and $400 in compute credits.
Your trial includes a project, an Open Catalog built on Apache Polaris, and Dremio-managed storage. Navigate to the SQL Runner from the left sidebar to start executing SQL.
Step 2: Create Folders and Seed Data
Create a folder structure that represents the different systems a financial institution typically operates:
CREATE FOLDER IF NOT EXISTS finservexp;
CREATE FOLDER IF NOT EXISTS finservexp.oracle;
CREATE FOLDER IF NOT EXISTS finservexp.s3;
CREATE FOLDER IF NOT EXISTS finservexp.postgres;
Note: Tables in the built-in Open Catalog use folder.subfolder.table_name without a catalog prefix. External sources use source_name.schema.table_name.
Financial data has specific requirements that Apache Iceberg addresses directly.
ACID Transactions. Financial records cannot tolerate partial writes. If a batch of transactions fails midway through an insert, Iceberg ensures no partial data is committed. The table either has all the new records or none of them.
Time Travel for Regulatory Audits. When a regulator asks "What did this account's transaction history look like on March 15th?", you can query the exact state of the table at that timestamp using Iceberg's snapshot history. No backup restoration required.
Automated Performance Management. Dremio handles the operational overhead of maintaining Iceberg tables automatically. It compacts small files, rewrites manifests, clusters data by frequently filtered columns, and vacuums expired snapshots. For transaction tables that grow by millions of rows per month, this prevents the gradual performance degradation that plagues manually maintained data lakes.
Autonomous Reflections. Risk queries against transaction tables tend to follow patterns: same time windows, same account filters, same aggregations. Dremio detects these patterns and automatically creates pre-computed Reflections stored as Iceberg tables. The query optimizer substitutes them transparently, turning multi-second scans into sub-second lookups without any manual intervention.
Iceberg vs. Federated: When to Use Which
In this tutorial, all tables are Iceberg because they're in the Open Catalog. In production, the decision depends on how fresh the data needs to be.
Keep data federated when real-time accuracy matters. A core banking system processes transactions throughout the day. If you replicate transaction data to Iceberg on a nightly ETL schedule, your risk analysts spend the morning working with yesterday's numbers. By querying Oracle directly through Dremio's federation, every query sees the latest committed transactions. You can still create manual Reflections on those federated views and choose a refresh cadence (hourly, every 15 minutes) that balances performance against freshness.
Migrate data to Iceberg when Dremio's automated performance management provides clear value. Historical transaction archives, aggregated compliance reports, and end-of-day snapshots are strong candidates. These datasets don't need sub-minute freshness, and Iceberg gives them automatic compaction, manifest optimization, clustering, vacuuming, and eligibility for Autonomous Reflections.
The practical approach for financial services: federate your operational systems for real-time visibility, then materialize analytical and historical datasets into Iceberg for automated optimization.
Step 3: Build Bronze Views
Bronze views create a consistent interface over the raw source tables. In financial services, consistency matters because audit queries fail when column names or date formats vary between systems. A single Bronze layer means every downstream query references the same schema regardless of whether the underlying data comes from Oracle, S3, or PostgreSQL.
Create the layer folders:
CREATE FOLDER IF NOT EXISTS finservexp.bronze;
CREATE FOLDER IF NOT EXISTS finservexp.silver;
CREATE FOLDER IF NOT EXISTS finservexp.gold;
This view standardizes the account data from core banking. The date cast ensures consistent timestamp handling across all downstream joins.
CREATE VIEW finservexp.bronze.bank_accounts AS
SELECT
account_id,
customer_name,
account_type,
balance,
currency,
branch,
CAST(opened_date AS TIMESTAMP) AS opened_timestamp,
status
FROM finservexp.oracle.accounts;
The transactions view casts the date and keeps all fields. The merchant and channel fields will be critical for anomaly detection in the Gold layer, where patterns like "multiple wire transfers via online channel" signal potential compliance issues.
CREATE VIEW finservexp.bronze.bank_transactions AS
SELECT
txn_id,
account_id,
CAST(txn_date AS TIMESTAMP) AS txn_timestamp,
txn_type,
amount,
merchant,
channel
FROM finservexp.oracle.transactions;
The KYC view casts both date fields to timestamps. The next_review date is especially important: we'll compute a review_overdue flag in the Silver layer to catch accounts that haven't been recertified on schedule.
CREATE VIEW finservexp.bronze.compliance_kyc AS
SELECT
kyc_id,
account_id,
verification_status,
risk_rating,
CAST(last_verified AS TIMESTAMP) AS last_verified_timestamp,
CAST(next_review AS TIMESTAMP) AS next_review_timestamp,
flagged_reason
FROM finservexp.postgres.kyc_records;
Step 4: Build Silver Views
Silver views join data across systems to create business-level entities. In financial services, this means combining what an account looks like (banking system) with what it's doing (transactions) and whether it's compliant (KYC).
Account Activity
This view answers: "What is each account doing?" It joins accounts with transactions and computes aggregate metrics like total debits, total credits, wire transfer count, and online debit count. These metrics are the raw signals that the Gold layer uses for risk scoring. The wire_transfer_count is particularly important because frequent wire transfers are a common AML (Anti-Money Laundering) indicator.
CREATE VIEW finservexp.silver.account_activity AS
SELECT
a.account_id,
a.customer_name,
a.account_type,
a.balance,
a.currency,
a.branch,
COUNT(t.txn_id) AS txn_count,
COALESCE(SUM(CASE WHEN t.txn_type = 'debit' THEN t.amount ELSE 0 END), 0) AS total_debits,
COALESCE(SUM(CASE WHEN t.txn_type = 'credit' THEN t.amount ELSE 0 END), 0) AS total_credits,
SUM(CASE WHEN t.channel = 'online' AND t.txn_type = 'debit' THEN 1 ELSE 0 END) AS online_debit_count,
SUM(CASE WHEN t.merchant LIKE '%Wire Transfer%' THEN 1 ELSE 0 END) AS wire_transfer_count
FROM finservexp.bronze.bank_accounts a
LEFT JOIN finservexp.bronze.bank_transactions t ON a.account_id = t.account_id
GROUP BY a.account_id, a.customer_name, a.account_type, a.balance, a.currency, a.branch;
Account Compliance View
This view answers: "Is each account compliant?" It joins accounts with KYC records and computes a review_overdue flag. An account with an overdue KYC review is a regulatory risk even if its transaction patterns are normal. By computing this in the Silver layer, the Gold view can combine both behavioral and compliance signals into a single risk score.
CREATE VIEW finservexp.silver.account_compliance AS
SELECT
a.account_id,
a.customer_name,
a.account_type,
a.balance,
a.branch,
k.verification_status,
k.risk_rating,
k.last_verified_timestamp,
k.next_review_timestamp,
k.flagged_reason,
CASE
WHEN k.next_review_timestamp < CURRENT_TIMESTAMP THEN true
ELSE false
END AS review_overdue
FROM finservexp.bronze.bank_accounts a
LEFT JOIN finservexp.bronze.compliance_kyc k ON a.account_id = k.account_id;
Step 5: Build Gold Views
Gold views are the final analytical layer that compliance officers, risk managers, and the AI Agent query directly. Each Gold view pre-computes a specific business decision so the end user doesn't need to know which tables to join or what thresholds to apply.
Account Risk Profile
This is the primary view for compliance teams. It combines transaction behavior from account_activity with KYC status from account_compliance to produce a composite_risk_level. The classification logic uses both data sources: an account rated "high" by KYC with 3+ wire transfers becomes "Critical," while a "medium" KYC rating with 2+ wire transfers is "Elevated." This layered approach catches risks that neither system would flag independently.
CREATE VIEW finservexp.gold.account_risk_profile AS
SELECT
act.account_id,
act.customer_name,
act.account_type,
act.balance,
act.branch,
act.txn_count,
act.total_debits,
act.total_credits,
act.wire_transfer_count,
comp.risk_rating AS kyc_risk_rating,
comp.verification_status,
comp.review_overdue,
comp.flagged_reason,
CASE
WHEN comp.risk_rating = 'high' AND act.wire_transfer_count >= 3 THEN 'Critical'
WHEN comp.risk_rating = 'high' OR comp.review_overdue = true THEN 'High'
WHEN comp.risk_rating = 'medium' AND act.wire_transfer_count >= 2 THEN 'Elevated'
WHEN comp.risk_rating = 'medium' THEN 'Moderate'
ELSE 'Low'
END AS composite_risk_level
FROM finservexp.silver.account_activity act
JOIN finservexp.silver.account_compliance comp ON act.account_id = comp.account_id;
Transaction Anomalies
This view surfaces individual transactions that warrant review. It flags transactions that are large relative to the account balance (over 25%), high-value wire transfers (over $5,000), or large online transactions (over $10,000). The WHERE clause filters to only include transactions above $5,000 or wire transfers above $3,000, so the view stays focused on actionable items rather than listing every transaction.
CREATE VIEW finservexp.gold.transaction_anomalies AS
SELECT
t.txn_id,
t.account_id,
a.customer_name,
t.txn_timestamp,
t.txn_type,
t.amount,
t.merchant,
t.channel,
a.balance,
CASE
WHEN t.amount > a.balance * 0.25 THEN 'large_relative_to_balance'
WHEN t.merchant LIKE '%Wire Transfer%' AND t.amount > 5000 THEN 'high_value_wire'
WHEN t.channel = 'online' AND t.amount > 10000 THEN 'large_online_transaction'
ELSE 'normal'
END AS anomaly_flag
FROM finservexp.bronze.bank_transactions t
JOIN finservexp.bronze.bank_accounts a ON t.account_id = a.account_id
WHERE t.amount > 5000
OR (t.merchant LIKE '%Wire Transfer%' AND t.amount > 3000);
Step 6: Enable AI-Generated Wikis and Tags
Click Admin in the left sidebar, then go to Project Settings.
Select the Preferences tab.
Scroll to the AI section and enable Generate Wikis and Labels.
Go to the Catalog and navigate to your Gold views under finservexp.gold.
Click the Edit button (pencil icon) next to the desired view.
In the Details tab, find the Wiki section and click Generate Wiki. Do the same for the Tags section by clicking Generate Tags.
Repeat for each Gold view.
Dremio's AI will produce descriptions like: "account_risk_profile: One row per account combining transaction activity metrics with KYC compliance status. The composite_risk_level field classifies accounts from Low to Critical based on KYC risk rating, wire transfer frequency, and review compliance status."
To enhance the generated wiki, copy its output into the AI Agent on the homepage and ask it to produce an improved version in a markdown code block with additional context. For example, you might add that "Critical accounts require same-day review by the compliance team." Copy the Agent's refined output and paste it back into the wiki editor.
This context is what enables the AI Agent to understand that when you ask about "high-risk accounts," you're referring to the composite_risk_level column, not just the kyc_risk_rating.
Step 7: Ask Questions with the AI Agent
Navigate to the Homepage and use the AI Agent:
"Which accounts have a Critical risk level?"
The Agent queries finservexp.gold.account_risk_profile and returns accounts where high KYC risk overlaps with frequent wire transfers.
"Show me a chart of total debits vs. credits by branch."
The Agent generates a grouped bar chart from the account activity data. This immediately highlights branches with unusual debit-to-credit ratios.
"List all transaction anomalies flagged as high_value_wire, ordered by amount."
The Agent queries finservexp.gold.transaction_anomalies, filters by the flag, and returns a sorted table. In a production environment, this query would help compliance analysts prioritize their reviews.
"Create a pie chart showing the distribution of accounts by composite risk level."
The Agent generates a visualization showing how many accounts fall into each risk category, giving compliance leadership a quick overview of the portfolio's risk profile.
Next Steps
Connect real banking data. Replace the simulated folders with federated connections to your actual Oracle, PostgreSQL, or S3 sources.
Add FGAC policies. Mask account balances and customer names for non-compliance users using Fine-Grained Access Control.
Use Iceberg time travel. Query historical snapshots for regulatory reporting and audit trail requirements.
Connect BI tools. Push the Gold views to Tableau or Power BI via Arrow Flight SQL for formal dashboards.
The financial analytics pipeline you built here runs entirely on open standards. Apache Iceberg stores the data. Apache Arrow powers the engine. Apache Polaris underpins the catalog. Your data is portable, governed, and ready for production.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.