December 10, 2025

Integrated Catalogs for On-Prem Iceberg Lakehouses

An Apache Iceberg catalog stores schemas, partition layouts, snapshots, and transaction history, all critical for query performance and interoperability. In on-prem environments using high-performance S3-compatible object storage, integrating the catalog directly into the storage layer could deliver significant benefits.

This session considers how such an architecture could reduce operational overhead through the facilitation of automated maintenance, improve query performance by keeping metadata close to the data, and strengthen governance with fine-grained access control and audit capabilities. Drawing inspiration from the goals of services like Amazon S3 Tables but with an eye toward on-prem autonomy, we will explore how this design could work with Dremio and other query engines to provide low-latency planning, high concurrency, and open format compatibility. Attendees will leave with a clear view of the potential for integrated on-prem catalogs to enhance both performance and control.

Topics Covered

Apache Iceberg
Data Catalogue
Data Lake

Sign up to watch all Subsurface 2025 sessions