December 10, 2025

Ingestion into Dremio: Concepts and Best Practices

This session walks through how to design a clear and reliable ingestion plan for Dremio Cloud. It explains what ingestion means in a lakehouse, why predictable pipelines matter, and how to choose the right tool for each workload. You will learn two core paths. One path uses Dremio to land data with CTAS, INSERT SELECT, COPY INTO, CREATE PIPE, and file uploads. The other path uses external engines like Spark, Flink, Kafka Connect, Fivetran, and other batch or streaming systems that write Apache Iceberg tables through the Dremio Catalog or any Iceberg REST interface. The talk then covers rules that keep ingestion stable. You will see how to design clear namespaces, use partitioning with intent, manage metadata growth, and separate raw and curated layers. You will leave with a simple checklist you can follow for any new pipeline.

Topics Covered

Apache Iceberg

Data Catalogue

Table Formats

Sign up to watch all Subsurface 2025 sessions

Speaker

Mark Hoerth

Principal Product Manager