November 13, 2025
Bauplan Reloaded: Bringing Git-for-Data to Humans, Servers, and Agents
The Git-for-data workflow was originally built in Nessie to bring version control to human-driven analytics. But as soon as we extend data branching to support data and AI pipelines, the original design needs to evolve. In this talk, we show how Bauplan pushed Nessie to its limits, powering one of the largest Git-for-data deployments in the world.
To scale to millions of branches, we extended Nessie with well-defined semantics that link data to transformations, and with APIs that expose lakehouse management to any Python client. These changes ensure correctness, composability, and observability, even when pipelines run with no human in the loop.
The recent rise of AI agents has only confirmed those bets: Bauplan agents use branches for sandboxed, safe experimentation. Thanks to Apache Iceberg, with native integration into Dremio’s query engine, agents can run SQL over versioned data, bringing Git-style.
Topics Covered
Sign up to watch all Subsurface 2025 sessions