The Dremio Blog

Dremio Blog: Open Data Insights

Dremio Blog: Open Data Insights

Quick Start with Apache Iceberg and Apache Polaris on your Laptop (quick setup notebook environment)

By following the steps in this guide, you now have a fully functional Iceberg and Polaris environment running locally. You have seen how to spin up the services, initialize the catalog, configure Spark, and work with Iceberg tables. Most importantly, you have set up a pattern that closely mirrors what modern data platforms are doing in production today.

Alex Merced
Dremio Blog: Open Data Insights

Benchmarking Framework for the Apache Iceberg Catalog, Polaris

The Polaris benchmarking framework provides a robust mechanism to validate performance, scalability, and reliability of Polaris deployments. By simulating real-world workloads, it enables administrators to identify bottlenecks, verify configurations, and ensure compliance with service-level objectives (SLOs). The framework’s flexibility allows for the creation of arbitrarily complex datasets, making it an essential tool for both development and production environments.

Pierre Laporte
Dremio Blog: Open Data Insights

Why Dremio co-created Apache Polaris, and where it’s headed

Polaris is a next-generation metadata catalog, born from real-world needs, designed for interoperability, and open-sourced from day one. It’s built for the lakehouse era, and it’s rapidly gaining momentum as the new standard for how data is managed in open, multi-engine environments.
Dremio Blog: Open Data Insights

Understanding the Value of Dremio as the Open and Intelligent Lakehouse Platform

With Dremio, you’re not locked into a specific vendor’s ecosystem. You’re not waiting on data engineering teams to build yet another pipeline. You’re not struggling with inconsistent definitions across departments. Instead, you’re empowering your teams to move fast, explore freely, and build confidently, on a platform that was designed for interoperability from day one.

Alex Merced
Dremio Blog: Open Data Insights

Extending Apache Iceberg: Best Practices for Storing and Discovering Custom Metadata

By using properties, Puffin files, and REST catalog APIs wisely, you can build richer, more introspective data systems. Whether you're developing an internal data quality pipeline or a multi-tenant ML feature store, Iceberg offers clean integration points that let metadata travel with the data.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 10 – Sampling and Prompts in MCP — Making Agent Workflows Smarter and Safer

That’s where Sampling comes in. And what if you want to give the user — or the LLM — reusable, structured prompt templates for common workflows? That’s where Prompts come in. In this final post of the series, we’ll explore: How sampling allows servers to request completions from LLMs How prompts enable reusable, guided AI interactions Best practices for both features Real-world use cases that combine everything we’ve covered so far

Alex Merced
Dremio Blog: Open Data Insights

The Case for Apache Polaris as the Community Standard for Lakehouse Catalogs

The future of the lakehouse depends on collaboration. Apache Polaris embodies the principles of openness, vendor neutrality, and enterprise readiness that modern data platforms demand. By aligning around Polaris, the data community can reduce integration friction, encourage ecosystem growth, and give organizations the freedom to innovate without fear of vendor lock-in.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 9 – Tools in MCP — Giving LLMs the Power to Act

Tools are executable functions that an LLM (or the user) can call via the MCP client. Unlike resources — which are passive data — tools are active operations.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 8 – Resources in MCP — Serving Relevant Data Securely to LLMs

One of MCP’s most powerful capabilities is its ability to expose resources to language models in a structured, secure, and controllable way.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 7 – Under the Hood — The Architecture of MCP and Its Core Components

By the end, you’ll understand how MCP enables secure, modular communication between LLMs and the systems they need to work with.

Alex Merced
Dremio Blog: Open Data Insights

Journey from AI to LLMs and MCP – 6 – Enter the Model Context Protocol (MCP) — The Interoperability Layer for AI Agents

What if we had a standard that let any agent talk to any data source or tool, regardless of where it lives or what it’s built with? That’s exactly what the Model Context Protocol (MCP) brings to the table.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 5 – AI Agent Frameworks — Benefits and Limitations

Enter agent frameworks — open-source libraries and developer toolkits that let you create goal-driven AI systems by wiring together models, memory, tools, and logic. These frameworks enable some of the most exciting innovations in the AI space… but they also come with trade-offs.

Alex Merced
Dremio Blog: Open Data Insights

What’s New in Apache Iceberg Format Version 3?

Now, with the introduction of format version 3, Iceberg pushes the boundaries even further. V3 is designed to support more diverse and complex data types, offer greater control over schema evolution, and deliver performance enhancements suited for large-scale, high-concurrency environments. This blog explores the key differences between V1, V2, and the new V3, highlighting what makes V3 a significant step forward in Iceberg's evolution.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 4 – What Are AI Agents — And Why They’re the Future of LLM Applications

We’ve explored how Large Language Models (LLMs) work, and how we can improve their performance with fine-tuning, prompt engineering, and retrieval-augmented generation (RAG). These enhancements are powerful — but they’re still fundamentally stateless and reactive.

Alex Merced
Dremio Blog: Open Data Insights

A Journey from AI to LLMs and MCP – 3 – Boosting LLM Performance — Fine-Tuning, Prompt Engineering, and RAG

this post, we’ll walk through the three most popular and practical ways to boost the performance of Large Language Models (LLMs): Fine-tuning Prompt engineering Retrieval-Augmented Generation (RAG) Each approach has its strengths, trade-offs, and ideal use cases. By the end, you’ll know when to use each — and how they work under the hood.

Alex Merced

1
2
3
…
11
Next Page »