Dremio Blog: Open Data Insights
-
Dremio Blog: Open Data Insights
What Are Lakehouse Catalogs? The Role of Catalogs in Apache Iceberg
A lakehouse catalog is the component that answers one question: "Where is the current metadata for this table?" Without a catalog, every engine would need to independently locate and track metadata files. With a catalog, there is a single source of truth that coordinates reads, writes, and access control across all engines. -
Dremio Blog: Open Data Insights
Enterprise Agentic Analytics Explained
Learn how agentic workflows for enterprise analytics connect AI agents, governed data and multi-step analysis to improve complex business decisions. -
Dremio Blog: Open Data InsightsWriting to an Apache Iceberg Table: How Commits and ACID Actually Work
Understanding the write process is critical because it explains why Iceberg can provide ACID guarantees on top of object storage, something that seems impossible when you consider that S3, ADLS, and GCS have no built-in transaction support. -
Dremio Blog: Open Data InsightsAgentic Lakehouse: The Architecture Built for AI-Native Analytics
The Agentic Lakehouse is not a new name for the same architecture. It represents a genuine shift in what a data platform is responsible for. A traditional lakehouse is a managed repository. An Agentic Lakehouse is an active participant in AI workflows: it provides context, enforces governance, and optimizes itself autonomously. -
Dremio Blog: Open Data InsightsText-to-SQL vs Agentic Analytics: What the Upgrade Requires
Text-to-SQL on a governed semantic layer is significantly more reliable than text-to-SQL on a raw production schema. The semantic layer constrains what the model can access, provides business-friendly terminology, and enforces metric definitions. The accuracy improvement is material. -
Dremio Blog: Open Data Insights
Semantic Layer vs Data Catalog: What’s the Difference?
The convergence of AI agents, open table formats, and semantic tooling is making this architecture decision more consequential than it was a few years ago. AI agents that query through ungoverned raw tables or that cannot discover what data exists are not reliable. -
Dremio Blog: Open Data Insights
Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans
The most expensive mistake in data lake querying is the accidental full table scan: a query that reads every file because the user did not correctly reference the partition columns. In Hive, this happens constantly. In Iceberg, it is structurally impossible because users never reference partition columns at all. -
Dremio Blog: Open Data Insights
Semantic Layer for AI Agents: Stop Getting the Numbers Wrong
The reason so many agentic analytics projects stall at proof-of-concept is not the AI model. It is the absence of the infrastructure that would make the AI trustworthy on real data. A semantic layer is that infrastructure. -
Dremio Blog: Open Data Insights
MCP Server Data Lakehouse: Connect AI Agents to Your Data
The Model Context Protocol (MCP) changes this equation. An MCP server data lakehouse setup gives any compliant AI client a single, governed, structured gateway to your data. You configure it once. Every agent that follows the spec connects automatically. -
Dremio Blog: Open Data Insights
Apache Iceberg Small Files Problem: Causes, Fixes, and Prevention
Solving the Apache Iceberg small files problem requires addressing it at multiple layers. Detection comes first: use table_files() to establish a baseline and set thresholds that trigger action. Prevention comes next: configure write.target-file-size-bytes at the source and increase checkpoint intervals for streaming jobs. -
Dremio Blog: Open Data Insights
Partition Evolution: Change Your Partitioning Without Rewriting Data
Partition evolution is one of the features that makes Iceberg a safe long-term choice. It means the partitioning decision you make today is not permanent. -
Dremio Blog: Open Data Insights
Apache Iceberg REST Catalog: What It Is and How to Use It
From that point, all engines share a consistent view of your Iceberg tables. New tables created by Spark appear in Dremio immediately. Schema changes committed by Flink are visible to PyIceberg clients without any manual sync. The catalog handles the coordination. -
Dremio Blog: Open Data Insights
Apache Iceberg Partition Evolution: Change Your Partitioning Strategy Without Rewriting Data
Partition evolution is one of those features that seems minor until you need it. Then it's the difference between a two-minute metadata update and a two-day rewrite project. If you're building on Iceberg and haven't thought carefully about your partition strategy yet, the time to do that is before your table reaches 10 TB, not after. -
Dremio Blog: Open Data Insights
What Is Agentic Analytics? How It Differs from BI and AI Assistants
The framing that matters here: agentic analytics is not a feature you add to your existing BI stack. It is a different approach to how analytical work gets done, who does it, and at what speed. -
Dremio Blog: Open Data Insights
Agentic Lakehouse vs Data Lakehouse: What Actually Changes
The Agentic Lakehouse is not a different architecture from your existing lakehouse. It is four additional structural layers built on top of a foundation you have likely already built: an AI Semantic Layer, Autonomous Performance, active metadata, and agent-specific interfaces.
- 1
- 2
- 3
- …
- 14
- Next Page »