Gnarly Data Waves

Episode 38

|

October 24, 2023

Building a Data Science Platform on Apache Iceberg and Nessie

Discover the future of data science and machine learning pipelines with Jacopo Tagliabue of Bauplan Labs in this webinar. Learn why modern data platforms are embracing Apache Iceberg and Nessie, and explore the transformative benefits of Nessie's git-like features for data management.

Join us for an insightful webinar featuring Jacopo Tagliabue of Bauplan Labs as he dives into the world of data science and machine learning pipelines. In this session, you’ll discover the rationale behind Bauplan Labs’ choice of open-source technologies, such as Apache Iceberg table format and Project Nessie transactional data catalog, for their cutting-edge platform. Gain valuable insights into why modern data platforms are increasingly adopting these technologies and how Nessie’s git-like features can revolutionize your data management. Don’t miss out on this opportunity to stay ahead in the world of data science and technology!

About Project Nessie – Introducing Nessie as a Dremio Source

Learn:

– Why Modern Data Platforms are being built on Apache Iceberg

– Why Modern Data Platforms are being built on Nessie

Watch or listen on your favorite platform

Register to view episode

Speakers

Alex Merced

Alex Merced

Alex Merced is a Senior Tech Evangelist for Dremio, a developer, and a seasoned instructor with a rich professional background. Having worked with companies like GenEd Systems, Crossfield Digital, CampusGuard, and General Assembly.

Alex is a co-author of the O’Reilly Book “Apache Iceberg: The Definitive Guide.”  With a deep understanding of the subject matter, Alex has shared his insights as a speaker at events including Data Day Texas, OSA Con, P99Conf and Data Council.

Driven by a profound passion for technology, Alex has been instrumental in disseminating his knowledge through various platforms. His tech content can be found in blogs, videos, and his podcasts, Datanation and Web Dev 101.

Moreover, Alex Merced has made contributions to the JavaScript and Python communities by developing a range of libraries. Notable examples include SencilloDB, CoquitoJS, and dremio-simple-query, among others.

Jacopo Tagliabue

Jacopo Tagliabue

Jacopo Tagliabue is the Bauplan Labs founder and educated in several acronyms across the globe (UNISR, SFI, MIT), he was co-founder and CTO of Tooso. Tooso was proudly serving predictions to millions of shoppers, before being acquired by Coveo (TSX:CVO).

He led Coveo’s A.I. and MLOps roadmap from scale-up to IPO, and built out Coveo Labs, an agile, applied R&D practice rooted in word-class collaborations (Stanford, Bocconi, Outerbounds, Uber, Microsoft, NVIDIA), open source and open science.

He talk *a lot*, and I’m often invited to do so by folks in industry (BBC, Walmart, Pinterest, eBay, Meta, Farfetch) and academia (SIRIP, CiE, KDD, Stanford, Harvard); He is currently an Adj. Professor of ML at NYU, which is mostly notable because it is the only job I ever had that my parents (sort of) understand.

His A.I. work has been featured several times in the general press and presented in business and academic venues (including WWW, RecSys, NAACL, as well as winning best paper at NAACL21).

In previous lives, he managed to do scienc-y things for a professional basketball team, simulate a pre-Columbian civilization and give an academic talk on videogames (among others improbable “achievements”).

Ready to Get Started? Here Are Some Resources to Help

Webinars

AI-Ready Data with Data Products

As AI adoption rises, data quality and reliability are crucial. This presentation shows how treating data as a product—with clear ownership, quality standards, and governance—ensures AI readiness. Discover practical strategies to overcome challenges like accessibility and governance, turning data into a strategic asset for AI innovation.

read more

Webinars

It’s Time To Consider a Hybrid Lakehouse Strategy

Discover the power of the hybrid lakehouse! Join data expert David Loshin to explore how this strategy combines the scalability of data lakes with the performance of data warehouses, enabling flexibility and future-proofing your data ecosystem.

read more

Webinars

10 Things to Look Forward in 2025 in the Iceberg Ecosystem

Explore how Apache Iceberg is redefining open data lakehouse technology in 2025! Join our webinar to uncover 10 key advancements, from scan planning to geospatial data support, and transform your data strategies.

read more
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.