The conventional wisdom for data platform modernization goes like this: pick a target system, build ETL pipelines for every source, migrate everything, validate the data, retrain your users, and then start getting value. That process takes six to eighteen months. During that time, analysts are waiting and leadership is asking why the investment has not produced results yet.
There is a better sequence. Instead of making everyone wait for a full migration, you start producing value on day one and migrate to Apache Iceberg at your own pace. The key is treating federation, the semantic layer, AI access, and Iceberg migration as four independent phases, each delivering value on its own, rather than a single all-or-nothing project.
Phase 1: Connect Your Data Where It Lives
Sign up for Dremio Cloud and you get a lakehouse project with a pre-configured Open Catalog right away. From there, start connecting your existing data sources through Dremio's federated query engine: PostgreSQL, MySQL, MongoDB, S3, Snowflake, BigQuery, Redshift, AWS Glue, Unity Catalog, and more.
No data copying. No ETL pipelines. Dremio queries your data where it already lives, using predicate pushdowns to push filtering work down to each source system.
The result: by the end of day one, your team has unified SQL access across every connected source. An analyst can join a PostgreSQL customer table with an S3-based event stream in a single query, without waiting for a data engineer to build a pipeline first.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Phase 2: Build a Semantic Layer Over Everything
Raw source tables have cryptic column names, inconsistent types, and zero business context. Before anyone can get reliable answers, whether human or AI, you need a curated layer on top.
Bronze/Raw views map to raw sources. They standardize column names, cast data types, and apply basic filters. One Bronze view per source table.
Silver/Business views apply business logic. This is where you define what "active customer" means (purchased in the last 90 days, not on a trial), join data across sources, and compute metrics.
Gold/Application views serve specific consumers: a dashboard, a report, or an AI agent. Each Gold view is optimized for its use case.
Dremio's AI Agent can help you come up with the SQL to generate these views efficiently.
Govern Access and Document Everything
Grant users access to specific views using Role-Based Access Control (RBAC) at the folder, dataset, and column level. For sensitive data, add Fine-Grained Access Control (FGAC) via UDFs for row-level security and column-level masking.
Then enrich every dataset with Wikis (human-readable documentation explaining what each column means) and Tags (categorical labels for discoverability). Dremio can auto-generate Wiki descriptions and suggest Tags by sampling your table data and schema. You review and refine the output instead of writing everything from scratch.
This metadata is not just for humans. It is what the AI Agent reads when generating SQL. Better documentation means more accurate answers.
Phase 3: Turn On Agentic Analytics
With a governed semantic layer in place, you are ready for AI. This is the important part: you do not need to complete the Iceberg migration first. Agentic analytics works on federated data from the moment the semantic layer exists.
Dremio's built-in AI Agent lets users type plain-English questions in the console. The agent writes SQL, executes it against your governed views, returns results, generates charts, and suggests follow-up questions. It respects every RBAC and FGAC policy in your catalog. Users can only get answers about data they are authorized to see.
For teams that want to use external tools, Dremio's MCP (Model Context Protocol) server lets ChatGPT, Claude Desktop, or custom agents connect directly to your Dremio environment. External tools get the same semantic context and security controls as the built-in agent.
Interface
What It Provides
Built-in AI Agent
Natural language queries, SQL generation, charts, follow-up suggestions inside Dremio
MCP Server
Connect any MCP-compatible AI tool (ChatGPT, Claude, custom agents) with full governance
AI SQL Functions
Run AI_GENERATE, AI_CLASSIFY, AI_COMPLETE directly in SQL for unstructured data analysis
At this point your organization has unified data access, a governed semantic layer, and AI-powered analytics, and you have not migrated a single table to Iceberg yet.
Phase 4: Migrate to Iceberg, One Dataset at a Time
Federation gets you access, but a full Apache Iceberg lakehouse gets you more: Autonomous Reflections that optimize query performance based on actual usage patterns, end-to-end caching, automated table maintenance (compaction, clustering, vacuuming), and interoperability with every Iceberg-compatible engine (Spark, Flink, Trino). Your data stays in your storage, in an open format, with no vendor lock-in.
The migration pattern is deliberately incremental:
Pick one dataset to migrate (start with the highest-volume or most-queried table)
Build an Iceberg pipeline to land that data in your object storage (S3 or Azure)
Update the Bronze view to point to the new Iceberg table instead of the legacy federated source
Silver and Gold views stay unchanged. They reference the Bronze view, which now reads from Iceberg instead of the old source.
Every consumer is unaffected. Dashboards, reports, and AI agents continue to work exactly as before.
Repeat for the next dataset whenever you are ready. There is no deadline and no big-bang cutover.
Why the View Layer Makes Migration Invisible
This is the architectural insight that makes the whole journey work. The semantic layer acts as a contract between physical data storage and every consumer above it.
When you swap a Bronze view's underlying source from PostgreSQL to an Iceberg table, every Silver view, Gold view, dashboard, report, and AI agent that depends on it continues to work without changes. The view contract (column names, data types, business logic) is preserved. Only the physical source pointer changes.
This means:
No dashboard rewiring
No report migration
No API endpoint changes
No AI Agent reconfiguration
No user communication (beyond governance notifications if your policies require them)
The migration happens underneath the abstraction layer. Everyone above it is oblivious.
The Tradeoffs
This phased approach is not free of costs.
Federation introduces network latency. Queries that join a PostgreSQL table in one region with an S3 bucket in another will be slower than queries against co-located Iceberg tables. Reflections and caching mitigate this for repeated queries, but the first execution of a new query pattern will feel it.
Iceberg migration still requires building ingest pipelines. Dremio does not eliminate that work. What it does is decouple the pipeline work from the analytics timeline. Your analysts and AI agents are productive while engineers build migration pipelines in the background.
Autonomous Reflections need a 7-day observation window before they start optimizing. Day-one performance on brand-new Iceberg tables relies on baseline optimizations (C3 caching, predicate pushdowns, vectorized execution). The system gets faster as it learns your query patterns.
And Dremio is an analytical engine, not a transactional database. Your OLTP workloads stay in PostgreSQL, MongoDB, or whatever system runs your application. You query those systems through federation, not as a replacement.
Start Today, Migrate Over Time
The traditional approach forces you to choose: spend months migrating, or keep running fragmented analytics on scattered data. Dremio eliminates that choice. Connect your sources, build your semantic layer, enable AI access, and start migrating to Iceberg when you are ready. Each phase delivers value independently, and the view layer ensures that migration never disrupts the people who are already getting answers.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.