Modern data platforms are standardizing on open lakehouse architectures, moving away from monolithic structures.
Apache Iceberg serves as the primary table format for analytic workloads, providing transactional guarantees and schema evolution.
Dremio facilitates efficient data ingestion into Iceberg through multiple mechanisms like SQL-based ingestion and programmatic tools.
Dremio Open Catalog ensures data interoperability, governance, and access across various engines, enhancing the lakehouse ecosystem.
Effective ingestion patterns in Dremio allow teams to adapt easily to changing data sources and operational requirements.
Modern data platforms are no longer built around monolithic warehouses or tightly coupled ingestion pipelines. Instead, organizations are standardizing on open lakehouse architectures, where data is stored in open formats, governed by shared catalogs, and processed by multiple engines based on workload.
At the center of this shift is Apache Iceberg, which has emerged as the de facto table format for analytic and AI workloads on object storage. Iceberg brings transactional guarantees, schema evolution, time travel, and partition evolution to data lakes, capabilities that were once exclusive to proprietary systems.
However, adopting Iceberg is only part of the story. Teams still face a critical question:
How do you ingest data into Iceberg efficiently, reliably, and at scale, without rebuilding complex ETL infrastructure?
Dremio is a lakehouse query and processing engine that natively reads from and writes to Apache Iceberg tables. Rather than introducing a proprietary ingestion framework, Dremio enables ingestion through SQL, file-based loading, and programmatic APIs, allowing teams to use the same engine for exploration, transformation, and data delivery.
Importantly, Dremio is not an orchestration layer. Workflow orchestration remains the responsibility of tools such as Apache Airflow, dbt, or custom pipelines built with DremioFrame, which can schedule, coordinate, and trigger ingestion workloads that execute on Dremio. This separation keeps architectures modular and flexible while allowing Dremio to focus on what it does best: fast, scalable data processing on open data.
In this post, we’ll walk through the most common data ingestion patterns with Dremio, focusing on ingesting data into Apache Iceberg tables managed through an open catalog. We’ll cover when to use each approach, how they fit into real-world pipelines, and best practices for choosing the correct pattern based on your data sources and workloads.
Whether you’re loading ad hoc datasets, migrating existing tables, ingesting files from object storage, or pulling data from APIs and databases, Dremio provides multiple paths to bring data into Iceberg, without locking you into a single engine or ingestion tool.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
What Is Dremio: The Agentic Lakehouse Platform
Dremio is the Agentic Lakehouse, a data platform built for AI agents and managed by agents. It unifies data, governance, and business context to enable fast, accurate analytics and AI workflows directly on open data, without pipelines, lock-in, or manual optimization.
At its foundation, Dremio is a high-performance data processing engine built on Apache Arrow and optimized for Apache Iceberg. Iceberg is Dremio’s first-class table format for both reads and writes, allowing users to create, ingest, and evolve analytical tables directly in object storage with full transactional guarantees. Tables written by Dremio are immediately interoperable with other Iceberg-compatible engines, preserving the openness of the lakehouse.
What distinguishes Dremio from traditional query engines is its agentic architecture, which combines AI-driven interaction, autonomous operations, and semantic understanding of data:
Integrated AI Agent and MCP Server
Dremio includes a built-in AI agent that can run queries, generate visualizations, explain SQL, and suggest optimizations using natural language. This capability extends beyond the Dremio UI through an MCP (Model Context Protocol) server, which exposes Dremio’s semantic understanding of data to external clients and tools. Together, these capabilities allow AI agents and users to interact with data more naturally and productively.
AI Functions for Unstructured Data
Dremio brings AI directly into SQL through AI Functions, enabling teams to transform unstructured content, such as PDFs, documents, and images, into structured, queryable data. These functions make it possible to ingest and analyze data that would traditionally require complex preprocessing pipelines, expanding what “data ingestion” means in a lakehouse context.
Autonomous Performance Management
Operating an Iceberg lakehouse at scale typically requires continuous tuning and maintenance. Dremio eliminates this burden through autonomous performance management, including automatic table optimization, results caching, query planning caches, and Autonomous Reflections. These capabilities continuously optimize performance and cost as data volumes and workloads evolve, without manual intervention.
Dremio Open Catalog: Apache Polaris–Based
Dremio includes a built-in, fully managed lakehouse catalog, Dremio Open Catalog, powered by Apache Polaris. The catalog tracks, governs, and secures Iceberg tables while enabling interoperable access across engines through standard Iceberg REST APIs. This ensures that data ingested into Iceberg remains discoverable, governed, and reusable across the broader ecosystem.
Integrated Semantic Layer
Dremio provides a first-class semantic layer that includes views, tags, wikis, and end-to-end lineage. This semantic context is not only consumed by users, but also leveraged by the AI agent and MCP server to deliver more accurate and meaningful results. The semantic layer spans both native Iceberg tables and virtualized data from databases, data warehouses, and data lakes, enabling agentic analytics across the entire data estate.
By combining first-class Iceberg support, autonomous lakehouse management, and AI-driven interaction, Dremio enables organizations to move from fragmented data silos to performant, agentic analytics on unified data, often overnight rather than through long, multi-year platform migrations.
What Is Dremio Open Catalog: An Apache Polaris–Based Lakehouse Catalog
An open lakehouse requires more than an open table format. It also needs a shared, interoperable catalog that tracks table metadata, enforces governance, and allows multiple engines to safely read and write the same data. This is the role of Dremio Open Catalog (DOC).
Dremio Open Catalog is a lakehouse catalog built directly into the Dremio platform, powered by Apache Polaris. It provides a fully managed catalog for Apache Iceberg tables, enabling organizations to govern, secure, and share data without introducing proprietary metadata layers or locking themselves into a single compute engine.
Built on Apache Polaris and Iceberg REST
At its core, DOC implements the Apache Iceberg REST catalog specification via Apache Polaris. This means Iceberg tables registered in Dremio Open Catalog can be accessed by any engine that supports the Iceberg REST API, including Spark, Flink, Trino, and others.
This architecture ensures:
Interoperability: Tables ingested through Dremio are immediately available to other Iceberg-compatible engines.
Consistency: All engines operate against the same catalog metadata and transactional state.
Openness: Metadata remains portable and standards-based, avoiding proprietary lock-in.
First-Class Governance for Iceberg Tables
Dremio Open Catalog is not just a metadata registry. It is a governance layer that provides fine-grained access control, auditing, and lineage for Iceberg tables. Permissions are enforced consistently whether data is queried interactively, ingested via SQL, or accessed programmatically.
Because the catalog is integrated into the Dremio platform, governance is applied automatically as data is created or ingested, without requiring separate systems to synchronize policies or metadata.
Designed for Ingestion and Evolution
Ingestion is one of the most demanding phases of the data lifecycle, especially in Iceberg-based lakehouses where tables continuously evolve. Dremio Open Catalog is designed to support:
Transactional table creation and writes
Schema evolution during ingestion
Partition evolution without rewrites
Safe concurrent access from multiple engines
This makes DOC a natural foundation for ingestion pipelines that start with raw data and mature into curated, shared Iceberg tables over time.
A Shared Foundation for Agentic Analytics
Dremio Open Catalog also plays a critical role in enabling agentic analytics. By centralizing metadata, permissions, and table definitions, the catalog provides the trusted foundation that Dremio’s AI agent and semantic layer rely on to understand data, apply context, and deliver accurate results.
In practice, this means that once data is ingested into Iceberg and registered in Dremio Open Catalog, it becomes immediately discoverable, governable, and usable, by humans, by AI agents, and by external engines alike.
With an open catalog in place, the question shifts from where data lives to how it should be ingested. In the next sections, we’ll look at the different ingestion paths Dremio provides, starting with simple, ad hoc workflows and progressing toward fully automated, production-grade patterns.
The Ingestion Landscape: Iceberg REST and Dremio-Native Ingestion Paths
With Apache Iceberg and an open catalog in place, ingestion becomes far more flexible than in traditional data platforms. Instead of being tied to a single engine or proprietary pipeline framework, organizations can choose from multiple ingestion paths based on data volume, latency, and operational complexity.
Because Dremio Open Catalog implements the Iceberg REST specification, any engine that supports Iceberg REST can ingest data into tables registered in the catalog. This enables a broad ecosystem of tools, batch engines, streaming frameworks, and custom applications, to safely create and update Iceberg tables while sharing a single source of truth for metadata and governance.
At the same time, Dremio provides several native ingestion mechanisms that are tightly integrated with its engine, catalog, and semantic layer. These options are often simpler to operate, require fewer moving parts, and automatically benefit from Dremio’s autonomous performance management and governance capabilities.
Broadly, ingestion into Iceberg using Dremio falls into four categories:
1. Ad Hoc and Interactive Ingestion
For exploratory or one-time datasets, Dremio supports interactive ingestion workflows directly in the UI. These are designed for speed and accessibility, allowing analysts and engineers to quickly turn files into Iceberg tables without writing pipelines or provisioning infrastructure.
2. SQL-Based Ingestion and Transformation
Dremio’s SQL engine can create and incrementally update Iceberg tables using familiar patterns such as CREATE TABLE AS SELECT and INSERT INTO SELECT. This approach is well suited for:
Migrating data from existing sources
Building curated tables from raw datasets
Performing incremental ingestion based on timestamps or identifiers
Because these operations are transactional and Iceberg-native, they can be safely re-run and integrated into scheduled workflows.
3. File-Based Loading from Object Storage
For ingestion patterns centered around files landing in object storage, Dremio provides file-oriented loading mechanisms that are optimized for bulk and continuous ingestion. These patterns are ideal for external data feeds, event-driven pipelines, and landing-zone architectures.
4. Programmatic Ingestion with DremioFrame
Some ingestion scenarios go beyond what can be expressed purely in SQL or through file ingestion. DremioFrame, a Python library built on top of the Dremio engine, enables programmatic ingestion from APIs, custom file formats, and JDBC-accessible systems, while still pushing execution down to Dremio and writing data into Iceberg tables.
Each of these ingestion paths serves a different purpose, and they are often used together within the same lakehouse. The key advantage of Dremio’s approach is that all roads lead to the same destination: governed, interoperable Apache Iceberg tables managed through an open catalog.
In the following sections, we’ll explore each of these ingestion patterns in detail, starting with the simplest workflows and progressing toward more advanced, production-grade ingestion pipelines, along with best practices for choosing the right approach for your use case.
File Upload in the UI: One-Time and Ad Hoc Ingestion into Iceberg
Not every ingestion workflow needs to start with a pipeline. For exploratory analysis, rapid prototyping, or one-off datasets, Dremio provides a simple UI-based file upload experience that allows users to ingest data directly into the lakehouse.
Through the Dremio UI, users can upload files in common formats such as CSV, JSON, and Parquet, preview their contents, and immediately query them using SQL. These uploaded files can then be transformed and materialized as Apache Iceberg tables, making them first-class citizens in the lakehouse rather than isolated artifacts.
This workflow is particularly valuable in early-stage analysis, where the goal is to move quickly from raw data to insight without standing up infrastructure or writing ingestion code.
How UI-Based Ingestion Works
The typical flow for UI-based ingestion looks like this:
Upload a file through the Dremio UI.
Inspect the inferred schema and data types.
The file is now an Iceberg table inside Dremio Open Catalog.
Once materialized, the resulting Iceberg table is managed by Dremio Open Catalog, governed like any other dataset, and immediately available for querying by other users, tools, and engines.
When This Pattern Makes Sense
UI-based file uploads are best suited for:
Ad hoc datasets shared by partners or internal teams
Exploratory analysis and proof-of-concept work
Small reference datasets that change infrequently
Analyst-driven workflows where speed matters more than automation
This approach lowers the barrier to entry for working with Iceberg by allowing users to focus on data and SQL rather than ingestion infrastructure.
Best Practices
To use UI-based ingestion effectively:
Convert uploaded files to Iceberg early Treat uploaded files as a staging step, not a long-term storage solution. Persist curated results into Iceberg tables as soon as possible.
Validate schemas before materializing Review inferred data types and column names to avoid propagating issues into downstream tables.
Avoid using UI uploads for recurring ingestion If a dataset needs to be refreshed regularly or scaled over time, transition to SQL-based, file-based, or programmatic ingestion patterns.
Limitations to Keep in Mind
While convenient, UI-based ingestion is intentionally scoped. It is not designed for:
Large-scale or high-frequency ingestion
Automated or scheduled pipelines
Complex schema evolution scenarios
For those use cases, Dremio’s SQL-based and file-based ingestion mechanisms provide more control and scalability.
UI uploads excel as a starting point, a fast path from raw data to governed Iceberg tables, before evolving into more robust ingestion patterns as requirements grow.
CTAS and INSERT INTO SELECT: SQL-Driven Ingestion and Incremental Updates
For most production ingestion workflows, SQL-based ingestion provides the best balance of simplicity, scalability, and control. Dremio supports this natively through CREATE TABLE AS SELECT(CTAS) and INSERT INTO SELECT, allowing teams to ingest, transform, and incrementally update Apache Iceberg tables using standard SQL.
These patterns are especially powerful because they:
Write transactionally to Iceberg
Work across federated sources (databases, warehouses, files, lakes)
Integrate cleanly into automated workflows
Support repeatable and incremental ingestion
Initial Loads with CREATE TABLE AS SELECT (CTAS)
CTAS is the most common way to perform an initial ingestion into Iceberg. It allows you to create a new Iceberg table directly from an existing source while applying transformations, filtering, and schema normalization.
Example: Migrating Data from an External Source into Iceberg
CREATE TABLE analytics.orders_iceberg
PARTITION BY (order_date)
AS
SELECT
order_id,
customer_id,
CAST(order_timestamp AS DATE) AS order_date,
order_status,
total_amount,
CURRENT_TIMESTAMP AS ingestion_ts
FROM snowflake.sales.orders
WHERE order_timestamp >= '2024-01-01';
In this example:
Data is read directly from an external source
The table is written in Apache Iceberg format
Partitioning is applied at creation time
Ingestion metadata is added explicitly
Once created, this Iceberg table is immediately registered in Dremio Open Catalog and available to other engines.
Best Practices for CTAS
Normalize schemas during ingestion to avoid propagating inconsistencies
Add ingestion timestamps for auditability and incremental logic
Choose partitions carefully based on query patterns, not source layouts
Prefer CTAS for initial loads and backfills, not ongoing updates
Incremental Ingestion with INSERT INTO SELECT
After the initial load, most datasets need to be updated incrementally. Dremio supports this pattern using INSERT INTO SELECT, appending new data to existing Iceberg tables in a fully transactional manner.
Example: Incremental Inserts Based on a Timestamp
INSERT INTO analytics.orders_iceberg
SELECT
order_id,
customer_id,
CAST(order_timestamp AS DATE) AS order_date,
order_status,
total_amount,
CURRENT_TIMESTAMP AS ingestion_ts
FROM snowflake.sales.orders
WHERE order_timestamp > (
SELECT MAX(order_date)
FROM analytics.orders_iceberg
);
This pattern:
Reads only new records from the source
Appends data safely to the Iceberg table
Can be rerun without affecting existing data
Example: Incremental Inserts Using a High-Watermark Table
For more control, many teams maintain a watermark table:
INSERT INTO analytics.orders_iceberg
SELECT
o.order_id,
o.customer_id,
CAST(o.order_timestamp AS DATE) AS order_date,
o.order_status,
o.total_amount,
CURRENT_TIMESTAMP AS ingestion_ts
FROM snowflake.sales.orders o
JOIN ingestion_metadata.watermarks w
ON o.order_timestamp > w.last_processed_ts
WHERE w.dataset_name = 'orders';
This approach is especially useful when:
Sources don’t guarantee ordering
Multiple ingestion jobs share state
Late-arriving data is expected
Idempotency and Reliability Considerations
When using SQL-based ingestion in production, it’s important to design for reliability:
Avoid duplicates by filtering on immutable keys or timestamps
Prefer append-only ingestion when possible
Track ingestion state explicitly, not implicitly
Treat SQL as declarative ingestion logic, not procedural code
Because Iceberg guarantees atomic commits, failed ingestion jobs will not leave tables in a partially written state.
When to Use CTAS and INSERT INTO SELECT
This pattern is ideal for:
Migrating data from existing systems into Iceberg
Building curated or analytics-ready tables
Incremental batch ingestion
Pipelines managed by schedulers or CI/CD systems
It is less suitable for:
Continuous file-based ingestion
Event-driven ingestion from object storage
Complex API-driven ingestion logic
For those scenarios, Dremio’s file-based ingestion and programmatic approaches are a better fit.
COPY INTO and CREATE PIPE: File-Based Ingestion from Object Storage
Many ingestion pipelines are built around files landing in object storage, whether produced by upstream systems, event-driven processes, or external partners. For these scenarios, Dremio provides file-native ingestion patterns that load data directly into Apache Iceberg tables without requiring intermediate processing engines.
Two SQL constructs are central to this approach: COPY INTO and CREATE PIPE. Together, they support both bulk ingestion and continuous file loading into Iceberg.
Bulk File Ingestion with COPY INTO
COPY INTO is designed for explicit, batch-oriented ingestion of files from object storage into an existing Iceberg table. It is well-suited for backfills, periodic loads, or controlled batch workflows.
Example: Loading Parquet Files from Object Storage
COPY INTO analytics.events_iceberg
FROM '@s3.raw_data/events/'
FILE_FORMAT 'PARQUET';
In this example:
Files are read directly from an object storage location
Data is appended transactionally to the Iceberg table
No staging tables or external engines are required
Example: Loading CSV Files with Explicit Options
COPY INTO analytics.customers_iceberg
FROM '@s3.raw_data/customers/'
FILE_FORMAT (
TYPE 'CSV',
FIELD_DELIMITER ',',
SKIP_FIRST_LINE TRUE
);
Dremio handles schema mapping, file discovery, and commit semantics automatically, ensuring that each COPY INTO operation results in a consistent Iceberg snapshot.
Best Practices for COPY INTO
Use COPY INTO for controlled batch ingestion, not continuous streaming
Validate schemas early, especially for CSV and JSON inputs
Group files into reasonably sized batches to avoid excessively small files
Prefer immutable file drops to simplify ingestion logic
Continuous Ingestion with CREATE PIPE
For pipelines that receive files continuously, CREATE PIPE enables automated ingestion. A pipe defines a persistent ingestion rule that watches a location and loads new files as they appear.
Example: Creating a Pipe for Continuous Ingestion
CREATE PIPE analytics.events_pipe
AS
COPY INTO analytics.events_iceberg
FROM '@s3.raw_data/events/'
FILE_FORMAT 'PARQUET';
Once created, the pipe:
Tracks which files have already been processed
Automatically ingests new files
Ensures each file is loaded exactly once
Pipes are ideal for landing-zone architectures where upstream systems continuously write files to object storage.
Starting and Managing Pipes
ALTER PIPE analytics.events_pipe SET PIPE_EXECUTION_PAUSED = FALSE;
Pipes can be paused, resumed, or monitored without redefining ingestion logic.
Handling Schema Evolution and File Layout
File-based ingestion often introduces schema drift over time. When ingesting into Iceberg:
Backward-compatible schema changes (adding columns) are handled gracefully
Incompatible changes should be normalized before ingestion
Partitioning decisions should be based on query access patterns, not file layout
Dremio’s autonomous optimization features help mitigate issues such as small files and suboptimal layouts after ingestion.
When to Use COPY INTO vs CREATE PIPE
Use Case
Recommended Pattern
One-time backfill
COPY INTO
Scheduled batch loads
COPY INTO
Continuous file drops
CREATE PIPE
Event-driven pipelines
CREATE PIPE
External data feeds
CREATE PIPE
Both patterns integrate seamlessly with Iceberg’s transactional model and Dremio Open Catalog, ensuring ingested data is immediately governed and queryable.
Where File-Based Ingestion Fits Best
File-based ingestion is ideal when:
Upstream systems already produce files
Object storage acts as a landing zone
Low-latency streaming is not required
Ingestion must scale independently of source systems
For ingestion that involves APIs, custom formats, or database-driven extraction logic, SQL alone is often not enough. In the next section, we’ll look at DremioFrame, a Python library that enables programmatic ingestion while still leveraging Dremio’s engine and Iceberg-native writes.
DremioFrame: Programmatic Ingestion into Iceberg Using Python
While SQL- and file-based ingestion cover many common lakehouse workflows, some ingestion scenarios require programmatic control. Data arriving from REST APIs, local files, Python applications, or external databases often needs to be fetched, normalized, or enriched before it can be written to Apache Iceberg.
DremioFrame addresses these use cases by providing a Python-native ingestion layer that writes data through the Dremio engine and into Iceberg tables managed by Dremio Open Catalog. Rather than acting as a local processing engine, DremioFrame serves as a control plane for ingestion, pushing work into Dremio wherever possible and preserving governance, lineage, and transactional guarantees.
DremioFrame supports both ELT-style ingestion (moving data from Dremio-connected sources into Iceberg) and ETL-style ingestion (bringing external data into the lakehouse).
API Ingestion
DremioFrame includes built-in support for ingesting data directly from REST APIs using client.ingest_api. This method handles fetching data, batching, and writing results into Iceberg tables.
from dremioframe.client import DremioClient
client = DremioClient()
client.ingest_api(
url="https://api.example.com/users",
table_name="marketing.users",
mode="merge", # 'replace', 'append', or 'merge'
pk="id"
)
This pattern is well suited for:
SaaS platforms and external services
APIs without bulk export capabilities
Incremental ingestion using merge semantics
Best practices
Use merge with a primary key for mutable API data
Use append for append-only event streams
Consider staging tables for complex transformations before merging
File Upload from Local Systems
When working with local files that are not already available in object storage, DremioFrame allows you to upload them directly into Dremio as Iceberg tables using client.upload_file.
from dremioframe.client import DremioClient
client = DremioClient()
# Upload a CSV file
client.upload_file("data/sales.csv", "marketing.sales")
# Upload an Excel file
client.upload_file("data/financials.xlsx", "marketing.financials")
# Upload an Avro file
client.upload_file("data/users.avro", "marketing.users")
Supported formats include CSV, JSON, Parquet, Excel, HTML, Avro, ORC, Lance, and Arrow/Feather, with format-specific options passed through to the underlying readers.
This approach is ideal for:
Analyst-provided files
Small to medium batch ingestion
File types not directly ingested through the Dremio UI
Database Ingestion (JDBC / ODBC)
DremioFrame provides a standardized way to ingest data from relational databases into Iceberg using client.ingest.database. This integration supports a wide range of databases and can leverage high-performance backends such as connectorx.
from dremioframe.client import DremioClient
client = DremioClient()
client.ingest.database(
connection_string="postgresql://user:password@localhost:5432/mydb",
query="SELECT * FROM users WHERE active = true",
table_name='"marketing"."users"',
write_disposition="replace",
backend="connectorx"
)
This pattern is commonly used for:
Migrating operational data into Iceberg
Isolating analytics workloads from source systems
Periodic batch ingestion from databases
Performance tips
Use connectorx whenever supported for faster ingestion
Use append for incremental loads
For very large datasets with SQLAlchemy, configure batch_size to stream results
File System Ingestion
For ingesting multiple local files at once, DremioFrame supports file system ingestion using glob patterns.
After ingestion, DremioFrame exposes table maintenance operations that are especially important for Iceberg tables:
# Compact small files
client.table("marketing.users").optimize()
# Expire old snapshots
client.table("marketing.users").vacuum(retain_last=5)
Additional best practices:
Use batching when inserting large DataFrames
Ensure type consistency before ingestion
Use staging tables for complex transformations
Prefer Iceberg tables over direct source queries for analytics workloads
When to Use DremioFrame
DremioFrame is the right choice when:
Ingestion originates outside Dremio (APIs, local files, Python apps)
Programmatic control is required
You want Iceberg-native writes without Spark
You need to combine Python logic with lakehouse governance
It complements SQL (CTAS, INSERT, MERGE) and file-based ingestion (COPY INTO, PIPE) by filling the gap between automation and open lakehouse ingestion.
Conclusion: Choosing the Right Ingestion Pattern with Dremio
Ingesting data into an Apache Iceberg lakehouse does not require a single tool or a one-size-fits-all approach. Instead, effective lakehouse architectures rely on multiple ingestion patterns, each optimized for different sources, data velocities, and operational requirements.
Dremio makes this possible by treating Apache Iceberg as a first-class write format and combining it with an open catalog, autonomous optimization, and both SQL- and programmatic ingestion paths. Whether data arrives as files, database tables, API responses, or local datasets, Dremio provides a clear and consistent path to governed, interoperable Iceberg tables.
Each ingestion method covered in this post serves a distinct purpose:
UI-based file uploads enable fast, ad hoc ingestion for exploration and prototyping.
CTAS and INSERT INTO SELECT provide a robust, SQL-first approach for migrations, transformations, and incremental batch ingestion.
COPY INTO and CREATE PIPE support scalable file-based ingestion from object storage, from one-time backfills to continuous file drops.
DremioFrame extends ingestion into the programmatic domain, enabling APIs, databases, local files, and Python-driven workflows to land cleanly in Iceberg.
The key advantage of Dremio’s approach is that all of these paths converge on the same outcome: Apache Iceberg tables managed by Dremio Open Catalog, optimized automatically, governed consistently, and immediately usable across engines and tools.
This convergence dramatically reduces the complexity typically associated with data ingestion. Teams no longer need separate systems for ingestion, transformation, optimization, and governance. Instead, they can focus on choosing the right pattern for the job, confident that the resulting data will be performant, reliable, and ready for analytics and AI.
By combining open standards, autonomous lakehouse operations, and agentic interaction, Dremio turns the journey from raw data to trusted insight into an incremental, flexible process, not a long, disruptive platform migration.
As your data sources and requirements evolve, the ingestion patterns can evolve with them, without re-architecting the lakehouse or sacrificing openness.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.