Dremio Blog

34 minute read · May 27, 2026

Agentic Lakehouse Architecture: The Four Technical Layers

Alex Merced Head of DevRel, Dremio

Start For Free

Copied to clipboard

Agentic Lakehouse Architecture: The Four Technical Layers

Why Agentic Lakehouse Architecture Decisions Matter

Layer 1: Object Storage, the Foundation of the Lakehouse

Layer 2: Apache Iceberg, the Table Format That Enables AI

Layer 3: Apache Polaris, the Agentic Lakehouse Control Plane

Layer 4: Dremio, the AI Data Platform Engine

Putting the Layers Together: An Agentic Query Walk-Through

Architecture Decision Guide

What Makes This Architecture Durable

Choosing the right concept is only half the job. Plenty of teams have adopted the lakehouse model, picked open formats, and still built systems that fail when AI agents start querying them at scale. The Agentic Lakehouse architecture solves a specific problem: how do you structure a data platform so that AI agents can discover, query, and reason over enterprise data reliably, securely, and fast enough to be useful?

This post is a technical reference for that structure. It documents four distinct layers, the decisions you make at each layer, and how those decisions propagate upward to affect what AI agents can and cannot do. If you are designing or auditing an Agentic Lakehouse deployment, this is the architecture map you need.

The four-layer Agentic Lakehouse architecture stack from object storage through Iceberg, Apache Polaris, and Dremio

Why Agentic Lakehouse Architecture Decisions Matter

Bad decisions at Layer 1 (wrong storage class, over-broad IAM) surface as slow agent queries and security incidents. Bad decisions at Layer 2 (wrong Iceberg version, aggressive snapshot expiry) cause ML reproducibility failures weeks after a model ships. Bad decisions at Layer 3 (over-privileged catalog roles, no credential vending) undermine zero-trust. Bad decisions at Layer 4 (no semantic layer, no Reflections) mean agents either cannot find the data they need or find it too slowly to be useful.

The layers are independent standards. You can swap one implementation for another. But they are not independent in effect. A weakness at any layer propagates upward and makes the entire agentic system less reliable. The architecture decisions you document here are the ones that determine whether your agentic system works at 2 AM without human intervention.

Layer 1: Object Storage, the Foundation of the Lakehouse

Every file in the Agentic Lakehouse lives in object storage. No compute, no catalog, no table format runs queries against anything else. Understanding what object storage holds, and how you organize it, is the prerequisite for everything above.

What Lives in Object Storage

Object storage holds two categories of files: data files and metadata files. Data files are Parquet (most common in Iceberg) or ORC files containing the actual row data in columnar format. Metadata files are the Iceberg metadata tree: table metadata JSON (the entry point for every table), manifest list files (Avro, listing all manifests for a snapshot), and manifest files (Avro, listing data files with column-level statistics). For row-level deletes, object storage also holds position delete files, equality delete files (Iceberg V2), or deletion vectors (Iceberg V3). Puffin files hold pre-computed statistics and Bloom filter indexes.

Every engine that touches the table reads from the same set of files. There is no duplication, no engine-specific copy of the data.

Choosing Your Storage Backend

The three dominant choices are Amazon S3, Azure Data Lake Storage Gen2 (ADLS Gen2), and Google Cloud Storage (GCS). All three support the access patterns Iceberg requires: object listing, range reads, conditional writes (for metadata commit safety). The choice is primarily driven by which cloud your compute runs in.

Attribute	S3	ADLS Gen2	GCS
Hierarchical namespace	No (emulated)	Yes (HNS)	No
Native IAM integration	AWS IAM	Azure AD / Entra	Google IAM
Storage tiers	Standard, IT, Glacier	Hot, Cool, Archive	Standard, Nearline, Coldline
Object versioning	Yes	Yes	Yes
Cross-region egress cost	Yes	Yes	Yes

The most important rule: run your query engine in the same region as your storage. Cross-region reads add latency and cost that compounds at AI agent query volumes.

Key Architectural Decisions at the Storage Layer

Storage class selection is the first decision that directly affects AI agent performance. Gold-layer tables that agents query for analysis must be in the hot tier (S3 Standard, ADLS Hot, GCS Standard). Moving them to Intelligent-Tiering or Nearline adds retrieval latency that shows up immediately in query response times. Use lifecycle policies to shift Silver and Bronze tables to tiered storage after defined inactivity windows.

Namespace and bucket layout determines how well your storage structure aligns with your catalog and security model. Organize by business domain (/finance/, /operations/, /product/), not by tool (/spark/, /trino/). Tools change; business domains don't. This layout also aligns directly with Apache Polaris namespace hierarchy, which makes RBAC policies easier to reason about.

IAM and RBAC at the storage layer should be minimal and scoped. The query engine should not hold blanket read access to all buckets. Instead, use catalog credential vending (covered in Layer 3) to issue short-lived, table-scoped credentials at query time. The storage-layer IAM role for the catalog service is the only entity that needs broad access.

Snapshot retention is a decision that affects AI workloads directly. Iceberg time travel depends on old metadata files and data files remaining available. If you expire snapshots aggressively (e.g., keeping only 1 day of history), you break the ML reproducibility guarantee that time travel provides. A practical policy for Gold-layer tables is to retain 30 days of snapshots and use tag-based pinning for snapshots tied to specific model training runs.

Encryption for regulated industries should use SSE-KMS (AWS), Azure-managed keys with CMEK support (Azure), or Customer-Managed Encryption Keys (GCS CMEK). Default SSE-S3 is acceptable for non-regulated workloads.

Layer 2: Apache Iceberg, the Table Format That Enables AI

Apache Iceberg is not a database. It is not a query engine. It does not run a service. It is a table format specification that defines how a set of files in object storage represents a table with ACID semantics. For a thorough look at how the format works internally, the Apache Iceberg architectural guide is the definitive reference.

Iceberg's Metadata Tree

Every Iceberg table has a metadata tree. At the root is the table metadata file (JSON): it contains the current snapshot pointer, the full schema, the partition specification, the sort order, and table properties. From the current snapshot, a pointer leads to the manifest list (Avro): an index of all manifest files for that snapshot, including partition-level statistics for each. Each manifest file (Avro) lists actual data files with column-level statistics (min, max, null count, row count, file size). The data files are the Parquet or ORC files holding actual rows.

This tree structure is what makes query optimization possible without full table scans. When an engine receives a query with a filter, it reads the manifest list, uses partition statistics to skip irrelevant manifests, then uses column statistics in manifests to skip data files where the filter cannot match. An agent querying a table with 10 billion rows may read only a handful of data files.

Iceberg Features That AI Workloads Depend On

Time travel lets you query any historical snapshot of a table using AS OF TIMESTAMP or FOR VERSION AS OF <snapshot_id>. For ML workloads, this means you can reproduce the exact dataset used to train a model months later, without maintaining a separate copy. This is not a nice-to-have; it is the mechanism that makes auditable AI systems possible.

Schema evolution allows you to add, rename, reorder, or remove columns from a table without rewriting any data files. When an upstream producer adds a column to a feed, downstream agent SQL does not break. Existing queries ignore columns they don't reference; new queries pick them up automatically.

Partition evolution allows you to change the partitioning strategy of a table without rewriting historical data. When AI query patterns shift (from daily aggregations to hourly), you update the partition spec going forward. Old data remains correctly partitioned for the old spec; new data writes under the new spec. The engine handles both transparently.

Hidden partitioning derives partition values from column transformations rather than raw column values. An agent writing WHERE event_date = '2025-03-15' does not need to know the physical partition layout. The engine derives the correct partition filter automatically. For more on how this eliminates full scans, see the hidden partitioning post.

Iceberg V3 Features Relevant to AI

Variant type is the most AI-relevant addition in Iceberg V3. It is a flexible semi-structured column type similar to JSON but stored in a binary-shredded format for efficient columnar reads. If you are storing LLM inference outputs, model predictions, raw API payloads, or agent action logs in your lakehouse, the Variant type gives you a native column type for that data without the overhead of full JSON parsing on every read.

Nanosecond timestamps extend timestamp precision from microseconds to nanoseconds. High-frequency AI event streams (inference latency logs, model monitoring signals, real-time feature updates) can generate events faster than microsecond resolution can distinguish. Nanosecond timestamps ensure event ordering is correct in these workloads.

Deletion vectors replace per-file delete files with a compact bitmap-encoded structure that marks which row positions within a data file are logically deleted. For ML feature stores that apply frequent row-level updates to feature values, deletion vectors are dramatically more efficient than accumulating separate delete files that must be merged at read time.

Layer 3: Apache Polaris, the Agentic Lakehouse Control Plane

The catalog layer is where AI agents begin every interaction with the data platform. Before an agent writes a single SQL statement, it needs to know what tables exist, what namespaces they live in, and what it is authorized to access. That is the catalog's job.

What a Catalog Actually Does

A catalog is a registry: it maps table names to the location of their current metadata file in object storage. It enforces the ACID commit protocol for metadata updates, ensuring that concurrent writers cannot corrupt the metadata tree. And it answers three questions: what tables exist, where their metadata is, and who is allowed to access them.

The modern standard for this is the Iceberg REST Catalog specification. Any engine that implements the REST Catalog client can work with any server that implements the REST Catalog API, without engine-specific integration code. This is what breaks the historical lock-in between catalogs and execution engines.

Apache Polaris Architecture

Apache Polaris is the Apache Software Foundation's reference implementation of the Iceberg REST Catalog. It models access control through a three-tier hierarchy.

A Principal is a user or service account identity. A Principal Role groups principals for access assignment. A Catalog Role is a set of privileges on catalog objects (catalogs, namespaces, tables). Assigning a principal role to a catalog role grants all principals in that role the corresponding privileges.

Catalog privileges include TABLE_READ_DATA (read data files for a table), TABLE_WRITE_DATA (write data files), NAMESPACE_LIST (enumerate tables in a namespace), and CATALOG_MANAGE_ACCESS (manage the catalog's access control policies). For AI agents, the minimum viable privilege set is NAMESPACE_LIST (to discover tables) and TABLE_READ_DATA (to execute queries) on specific namespaces.

For a deeper overview of how Polaris is structured and what it enables, the Apache Polaris overview covers the architecture in detail.

Credential Vending

Credential vending is the mechanism that makes zero-trust storage access possible. When a query engine needs to read a table, it requests credentials from the catalog rather than using long-lived static credentials. The catalog checks the requesting principal's privileges, then issues short-lived, scoped storage credentials (STS tokens for S3, SAS tokens for ADLS, short-lived GCS tokens) that grant access only to the specific data files for that table.

The engine takes those credentials and reads directly from object storage. The catalog never touches the data plane. Each query gets its own scoped credential set with a short TTL. If a credential leaks, it expires within minutes and is scoped to exactly the data the query needed.

This means no engine needs broad storage access. No service account needs a wildcard s3:GetObject on *. Credential vending is the enforcement mechanism that makes RBAC at the catalog layer meaningful for storage access.

Dremio Open Catalog: Polaris Plus Federation and FGAC

Dremio Open Catalog is Apache Polaris extended with two additional capabilities: federated source management and enhanced fine-grained access control (FGAC).

Federated source management allows Open Catalog to register not just Iceberg tables but also external sources (relational databases, cloud warehouses, SaaS APIs, and file-based sources) under a unified namespace hierarchy. An AI agent can enumerate both Iceberg Gold-layer tables and a PostgreSQL operational database from a single catalog API call.

The enhanced FGAC layer adds row-level security policies and column masking rules that go beyond standard Polaris table-level privileges. A row-level policy on the emea_revenue table might filter to only rows where region = 'EMEA' for principals with the EMEA analyst role. A column masking rule on a customer_email column might return NULL for all principals without the PII access privilege. These policies apply transparently at query execution time without requiring the agent or query author to know they exist.

Why the Catalog Is Where Agents Start

GET /v1/catalogs/prod_catalog/namespaces HTTP/1.1
Authorization: Bearer <token>

# Response:
{
  "namespaces": [["finance"], ["operations"], ["product"], ["ml_features"]]
}

GET /v1/catalogs/prod_catalog/namespaces/finance/tables HTTP/1.1
Authorization: Bearer <token>

# Response:
{
  "identifiers": [
    {"namespace": ["finance"], "name": "emea_daily_revenue"},
    {"namespace": ["finance"], "name": "fx_rates"},
    {"namespace": ["finance"], "name": "cost_centers"}
  ]
}

An AI agent calls these REST endpoints before generating any SQL. The catalog returns only the namespaces and tables the agent's principal role is authorized to see. This means the agent's discovery space is automatically scoped to what it is allowed to query. Table discovery through the catalog is not a convenience; it is a security control.

Layer 4: Dremio, the AI Data Platform Engine

Dremio is the execution layer where queries run, results are computed, and AI agents receive data. It is an MPP (massively parallel processing) distributed query engine built around Apache Arrow as its native in-memory and wire format.

Arrow-Native Execution

The Apache Arrow columnar format is a language-agnostic, cache-friendly in-memory representation for columnar data. Dremio uses Arrow throughout: query execution produces Arrow batches in memory, and those same batches are transferred to clients over Arrow Flight without serialization. There is no intermediate format conversion between what the executor computes and what the Python client receives.

This matters for AI workloads because Python ML pipelines consume Arrow tables natively via pyarrow. An agent that queries Dremio over Arrow Flight receives a pyarrow.Table that is ready for pandas, Polars, or NumPy operations without a copy or conversion step. ADBC (Arrow Database Connectivity) provides the same zero-copy benefit as a standard connectivity interface replacing JDBC and ODBC for Arrow-native clients.

Query Federation: One Query, All Sources

Dremio can query Iceberg tables in your lakehouse, live rows in a PostgreSQL operational database, data in a Snowflake warehouse, files in Azure Blob Storage, and REST APIs, all in a single SQL statement. The query optimizer handles source-specific translation, pushdown, and join ordering automatically.

For AI agents, this is significant. An agent analyzing "EMEA revenue vs. sales pipeline" can join the Gold-layer emea_daily_revenue Iceberg table with live pipeline data from a CRM database in a single query. No ETL pipeline is required to consolidate the data first. The agent writes one SQL statement; the engine handles the rest.

Autonomous Performance: Five-Component System

Dremio's performance layer for AI workloads consists of five interacting components. Agents query at unpredictable times with unpredictable query shapes. Autonomous Performance is the system that handles that unpredictability without manual tuning.

Reflections are pre-computed Iceberg tables stored in Dremio's managed storage. They come in two types: Raw Reflections (a subset or superset of columns from a source table, at the same row granularity) and Aggregation Reflections (pre-grouped, pre-aggregated results). When a query matches a Reflection's definition, Dremio's optimizer silently substitutes the Reflection for the original table. The agent writes SQL against the logical dataset and the engine serves results from the optimized physical target. Reflections are themselves Iceberg tables. They are open format, not a proprietary cache format.

Autonomous Reflections remove the requirement for a human to decide which Reflections to create. The system observes the query workload over a rolling 7-day window, uses ML to identify which Reflections would provide the highest performance benefit given the actual query patterns, and automatically creates, updates, and drops Reflections based on those observations. An Agentic Lakehouse at production scale may have hundreds of Reflections under active management, none of which were manually defined.

C3 (Columnar Cloud Cache) is an NVMe SSD disk cache that runs on executor nodes. When Dremio reads Parquet columns from cloud storage, it caches those column pages on local NVMe. On subsequent reads, those pages are served from local disk rather than over the network from object storage. C3 is column-selective: it only caches the specific columns a query reads, not entire Parquet files. For AI agent workloads where the same Gold-layer dimension tables are joined repeatedly, C3 eliminates the repeated cloud storage read latency.

Results Cache serves identical queries from a cached result set without re-executing the query at all. When an AI agent (or a dashboard backed by an agent) re-asks the same question within the cache TTL window, Dremio returns the cached result immediately. For AI monitoring loops that periodically re-run the same analysis SQL, Results Cache eliminates essentially all query latency.

Query Plan Cache stores compiled query plans (the optimizer's output) and reuses them for structurally identical queries. When agents generate SQL from templates (e.g., parameterized by region or date), the plan for the template structure is compiled once and reused for all parameter variants, saving planning overhead on each execution.

AI Semantic Layer: How Agents Understand Data

A query engine that can execute fast SQL is necessary but not sufficient for agentic workflows. Agents also need to understand what the data means before they can write correct SQL. The AI Semantic Layer provides that understanding. For a broader explanation of what semantic layers do and why they matter, the semantic layer guide is the reference.

Virtual Datasets (VDS) are SQL-defined logical views registered in Dremio's catalog. They implement a three-layer medallion pattern. The Prep layer handles raw joins, type casting, and basic cleaning, shielding agents from the complexity of raw source schemas. The Business layer applies business rules, defines KPIs, renames columns to business terminology, and encodes metric logic. The Application layer presents curated, consumer-specific views: the tables agents and BI tools actually query. Reflections can be defined on VDS at any layer, so agents benefit from pre-computation even when querying a Business or Application-layer view.

Wikis attach natural language documentation to any dataset or column. An AI agent reading the wiki for emea_daily_revenue before generating SQL learns: "This table contains daily gross revenue for the EMEA region, excluding inter-company transfers. Revenue is recognized at contract signature. Updated daily at 06:00 UTC from the ERP system." That context changes what SQL the agent generates.

Labels are categorical tags attached to datasets: pii, gold, finance, revenue, emea, ml-ready. An AI agent can search for tables using semantic labels rather than exact table names. A search for tables tagged revenue and emea returns all relevant datasets regardless of what they are named internally.

AI-generated metadata closes the documentation gap that exists in every large lakehouse. Dremio samples data from tables and automatically generates wiki descriptions using an LLM. The AI identifies column types, value distributions, apparent business meaning, and common use patterns, then writes a wiki draft. Human data stewards review and refine the draft. This makes it feasible to document hundreds of datasets rather than leaving them undocumented.

Semantic search combines labels and wiki text to enable concept-based discovery. An agent searching for "tables about customer churn" finds relevant datasets even if no table is named churn. This is the capability that makes catalog discovery genuinely useful for AI agents operating in large, unfamiliar data environments.

Agentic Interfaces: How External Agents Connect

The built-in AI Agent in Dremio Cloud provides a conversational UI with five operational modes. Discover enumerates tables, namespaces, and metadata. Explore profiles a dataset (distributions, cardinality, sample rows). Analyze generates SQL from a natural language question and executes it. Visualize converts a query result into a chart automatically. Explain interprets a query result and surfaces the key finding in plain language. This is a full analytical pipeline from question to insight, not a chatbot that stops at SQL generation.

The MCP Server is Dremio's open-source, OAuth-authenticated implementation of the Model Context Protocol. It exposes Dremio SQL capabilities as tools that any MCP-compatible AI framework (LangChain, LlamaIndex, Claude Projects, AutoGen) can discover and call. An external agent authenticates via OAuth, discovers the available SQL tools, and calls them to execute queries against Dremio. For a practical introduction to MCP and how it works, the MCP beginner guide walks through the setup and concepts.

Python developers building custom agentic pipelines can use dremioframe (which provides the DremioAgent class for high-level agentic workflows), dremio-simple-query (for straightforward query execution returning DataFrames or Arrow tables), or dremio-cli for scripting and automation.

AI SQL Functions

Dremio's AI SQL Functions embed LLM-powered operations directly in SQL. Functions for sentiment analysis, entity extraction, text classification, and semantic similarity can be called inline in a query. An agent generating SQL to classify customer support tickets by urgency can embed a classification function directly in the SELECT clause rather than building a separate Python pipeline to post-process results. The classification happens inside the distributed query engine, close to the data.

Putting the Layers Together: An Agentic Query Walk-Through

All four layers work together for every query an AI agent runs. The following scenario traces a single question through the complete stack.

Scenario: "Why did EMEA revenue drop last week?"

End-to-end agentic query flow: 8-step numbered diagram from MCP to result

Step 1: Question submission via MCP. The AI agent (an external LangChain agent, or a user in the built-in AI interface) submits the natural language question. The MCP Server receives the request.

Step 2: OAuth authentication and role assignment. Dremio validates the agent's OAuth token, identifies the corresponding principal, and assigns the associated principal role. This role determines which catalog namespaces and tables the agent can see.

Step 3: Catalog discovery. The agent queries the Open Catalog REST API for tables in the finance namespace tagged with revenue and emea. The catalog returns emea_daily_revenue, emea_fx_rates, and emea_bookings_summary: the three tables visible to this principal's role.

Step 4: FGAC enforcement. The query optimizer checks row-level security policies. The agent's role has an EMEA-scoped policy on emea_daily_revenue: even if the agent writes a query without a region filter, the engine appends the filter WHERE region = 'EMEA' automatically. Column masking applies to any PII columns, returning NULL or a masked value for principals without PII privilege.

Step 5: Semantic layer context. The agent reads the wiki for emea_daily_revenue. It learns that revenue is recognized at contract signature (not cash receipt), that the table is partitioned by revenue_date, and that week-over-week comparison requires comparing ISO weeks. The agent incorporates this context into the SQL it generates.

Step 6: Optimizer checks Reflection and cache. The generated SQL queries weekly aggregated EMEA revenue. The optimizer finds an Aggregation Reflection that pre-computes weekly revenue totals by region. It substitutes the Reflection for the base table scan. It also checks the Results Cache; if an identical query ran in the last 15 minutes, the cached result is returned immediately without any storage read.

Step 7: C3-accelerated Iceberg scan. Assuming a cache miss, executor nodes read from the Aggregation Reflection's Iceberg files. The manifest pruning narrows the scan to the last two weeks of partitions. The column pages for revenue_date, region, and gross_revenue are already in C3 NVMe cache from prior queries. The read bypasses cloud storage entirely.

Step 8: Result returned as Arrow. The executor returns the aggregated result as an Arrow batch over Arrow Flight. The agent receives the data, compares last week to the prior week, identifies a 12% revenue gap in the DACH sub-region, and generates its narrative response.

Total elapsed time from question to insight: under 2 seconds for a query that touches 18 months of EMEA revenue history.

Architecture Decision Guide

The following tables capture the key decisions at each layer and the recommended default for AI agent workloads. These are starting points, not rules; your specific compliance requirements, team expertise, and cloud choice will adjust some of them.

Layer 1: Object Storage

Decision	Options	AI-Workload Recommendation
Storage class for Gold tables	Hot / Intelligent-Tiering / Cold	Hot (Standard)
Namespace layout	Domain-driven / Tool-driven	Domain-driven always
Snapshot retention	1 day / 7 days / 30 days	30 days min; pin ML training snapshots
Encryption	SSE-S3 / SSE-KMS / CMEK	SSE-KMS for regulated industries
Versioning	Enabled / Disabled	Enable on Gold buckets

Layer 2: Apache Iceberg

Decision	Options	AI-Workload Recommendation
Iceberg spec version	V2 / V3	V3 for semi-structured AI data; V2 for mature stable stacks
Delete strategy	Equality deletes / Deletion vectors	Deletion vectors (V3) for ML feature stores
Branching	WAP branches / None	WAP for quality-gating before publish
Partition strategy	Explicit / Hidden	Hidden partitioning always

Layer 3: Catalog

Decision	Options	AI-Workload Recommendation
Catalog implementation	Self-managed Polaris / Dremio Open Catalog	Open Catalog for FGAC + federation + managed service
Credential vending	Enabled / Disabled	Enabled always for zero-trust
Environment isolation	Separate catalogs / Namespace prefixes	Separate catalogs per environment (dev, staging, prod)
AI agent privilege	Full read / Namespace-scoped	Namespace-scoped, table-level read only

Layer 4: Query Engine

Decision	Options	AI-Workload Recommendation
Reflection management	Manual / Autonomous	Autonomous for unknown/changing agent query patterns
Semantic documentation	Manual wikis / AI-generated + review	AI-generated with human review
Agent interface	Built-in AI / MCP external	Both: built-in for analysis, MCP for programmatic agents
Query connectivity	JDBC/ODBC / Arrow Flight / ADBC	Arrow Flight + ADBC for Python ML pipelines

What Makes This Architecture Durable

The four layers share one property that makes this architecture worth building: they are all open standards. Apache Parquet, Apache Iceberg, Apache Polaris (Iceberg REST Catalog), and Apache Arrow are not vendor formats. No single vendor controls the spec for any of them. You can swap query engines, swap catalog implementations, or migrate between cloud storage providers without changing the data format or the table format.

This composability is what makes the Agentic Lakehouse architecture viable long-term. As Iceberg V3 adoption grows and the Polaris REST Catalog becomes the universal standard for catalog interoperability, adding a new engine or a new AI framework to your stack becomes a configuration change, not a migration project.

The architectural decisions documented here are not permanent. Query patterns evolve, data volumes grow, and AI agent capabilities change. What stays constant is the layered structure: storage, format, catalog, engine. Each layer has a clear responsibility, a clean interface to the layers above and below it, and an open-standard contract that keeps your options open.

If you want to explore this architecture hands-on, try Dremio Cloud free for 30 days. It includes Open Catalog (Apache Polaris), the built-in AI Agent, the MCP Server, Autonomous Reflections, and the full AI Semantic Layer. All four layers of the Agentic Lakehouse are configured and ready for AI agent workloads out of the box.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Open Data Insights

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Aug 16, 2023 Dremio Blog: News Highlights

5 Use Cases for the Dremio Lakehouse

With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.

Alex Merced

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

Agentic Lakehouse Architecture: The Four Technical Layers

Table of Contents

Why Agentic Lakehouse Architecture Decisions Matter