Gnarly Data Waves

Episode 38

|

October 24, 2023

Building a Data Science Platform on Apache Iceberg and Nessie

Discover the future of data science and machine learning pipelines with Jacopo Tagliabue of Bauplan Labs in this webinar. Learn why modern data platforms are embracing Apache Iceberg and Nessie, and explore the transformative benefits of Nessie's git-like features for data management.

Join us for an insightful webinar featuring Jacopo Tagliabue of Bauplan Labs as he dives into the world of data science and machine learning pipelines. In this session, you’ll discover the rationale behind Bauplan Labs’ choice of open-source technologies, such as Apache Iceberg table format and Project Nessie transactional data catalog, for their cutting-edge platform. Gain valuable insights into why modern data platforms are increasingly adopting these technologies and how Nessie’s git-like features can revolutionize your data management. Don’t miss out on this opportunity to stay ahead in the world of data science and technology!

About Project Nessie – Introducing Nessie as a Dremio Source

Learn:

– Why Modern Data Platforms are being built on Apache Iceberg

– Why Modern Data Platforms are being built on Nessie

Watch or listen on your favorite platform

Register to view episode

Speakers

Alex Merced

Alex Merced

Alex Merced is Head of DevRel for Dremio, a developer, and a seasoned instructor with a rich professional background. Having worked with companies like GenEd Systems, Crossfield Digital, CampusGuard, and General Assembly.

Alex is a co-author of the O’Reilly Book “Apache Iceberg: The Definitive Guide.”  With a deep understanding of the subject matter, Alex has shared his insights as a speaker at events including Data Day Texas, OSA Con, P99Conf and Data Council.

Driven by a profound passion for technology, Alex has been instrumental in disseminating his knowledge through various platforms. His tech content can be found in blogs, videos, and his podcasts, Datanation and Web Dev 101.

Moreover, Alex Merced has made contributions to the JavaScript and Python communities by developing a range of libraries. Notable examples include SencilloDB, CoquitoJS, and dremio-simple-query, among others.

Jacopo Tagliabue

Jacopo Tagliabue

Jacopo Tagliabue is the Bauplan Labs founder and educated in several acronyms across the globe (UNISR, SFI, MIT), he was co-founder and CTO of Tooso. Tooso was proudly serving predictions to millions of shoppers, before being acquired by Coveo (TSX:CVO).

He led Coveo’s A.I. and MLOps roadmap from scale-up to IPO, and built out Coveo Labs, an agile, applied R&D practice rooted in word-class collaborations (Stanford, Bocconi, Outerbounds, Uber, Microsoft, NVIDIA), open source and open science.

He talk *a lot*, and I’m often invited to do so by folks in industry (BBC, Walmart, Pinterest, eBay, Meta, Farfetch) and academia (SIRIP, CiE, KDD, Stanford, Harvard); He is currently an Adj. Professor of ML at NYU, which is mostly notable because it is the only job I ever had that my parents (sort of) understand.

His A.I. work has been featured several times in the general press and presented in business and academic venues (including WWW, RecSys, NAACL, as well as winning best paper at NAACL21).

In previous lives, he managed to do scienc-y things for a professional basketball team, simulate a pre-Columbian civilization and give an academic talk on videogames (among others improbable “achievements”).

Ready to Get Started? Here Are Some Resources to Help

What is Apache Polaris?

Guides

What is Apache Polaris

Apache Polaris is an open-source catalog service purpose-built for managing Apache Iceberg tables in a distributed, multi-engine environment. At its core, Polaris implements the Iceberg REST Catalog API, providing a standardized, cloud-native method for connecting query engines with Iceberg metadata, without requiring tight coupling to storage systems.

read more
Whitepaper Thumb

Whitepaper

Accelerate AI and Analytics with Dremio

read more
Webinars Thumb

Webinars

Delivering Data Products with a Lakehouse Architecture: Insights from BARC & Dremio

Learn how the data lakehouse powers scalable, reusable data products. Join BARC and Dremio for strategies and best practices.

read more
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Enable the business to accelerate AI and analytics with AI-ready data products – driven by unified data and autonomous performance.