The Cloud Data Lake Site

Featured Subsurface Community Content

Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Learn how to migrate your existing Hive tables into Apache Iceberg tables to take full advantage of features like Version Rollback, Partition Evolution and more.
Read more
Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Learn about the differences in the governance and communities behind open source table formats like Apache Iceberg, Apache Hudi, and Delta Lake.
Read more
Meetup: Subsurface Talks with Asurion and Dremio

Meetup: Subsurface Talks with Asurion and Dremio

Two great Subsurface Talks. Asurian shares how they overcame the Metadata culture to hurdle to build DPS and innovated using graph type data model without a graph database. And, Dremio provides an introduction to Apache Iceberg views and how they can be useful to you.
Read more
Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Learn about hidden partitioning and why it is such a valuable feature of Apache Iceberg tables.
Read more
Building a Historical Financial Data Lake at Bloomberg
Subsurface LIVE Sessions

Building a Historical Financial Data Lake at Bloomberg

How the Bloomberg Enterprise Data Lake engineering group ingested historical financial data into Apache Iceberg tables.
Read more

Watch the recordings from Subsurface LIVE

Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Learn how to migrate your existing Hive tables into Apache Iceberg tables to take full advantage of features like Version Rollback, Partition Evolution and more.
Read more
Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Learn about the differences in the governance and communities behind open source table formats like Apache Iceberg, Apache Hudi, and Delta Lake.
Read more
Meetup: Subsurface Talks with Asurion and Dremio

Meetup: Subsurface Talks with Asurion and Dremio

Two great Subsurface Talks. Asurian shares how they overcame the Metadata culture to hurdle to build DPS and innovated using graph type data model without a graph database. And, Dremio provides an introduction to Apache Iceberg views and how they can be useful to you.
Read more
Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Learn about hidden partitioning and why it is such a valuable feature of Apache Iceberg tables.
Read more
How to Migrate a Hive Table to an Iceberg Table

How to Migrate a Hive Table to an Iceberg Table

Learn how to architect a migration from your existing Hive tables into Apache Iceberg tables to take full advantage of features like Version Rollback, Partition Evolution and more.
Read more
Maintaining Iceberg Tables – Compaction, Expiring Snapshots, and More

Maintaining Iceberg Tables – Compaction, Expiring Snapshots, and More

Learn about the strategies and best practices for maintaining Apache Iceberg tables.
Read more
That’s a Wrap! Highlights from Subsurface LIVE Winter 2022

That’s a Wrap! Highlights from Subsurface LIVE Winter 2022

With Subsurface LIVE Winter 2022 over, we reflect on the event with stats, highlights, and look forward to what’s next for Subsurface.
Read more
Tracking & Triggering Pattern with Spark Stateful Streaming
Data Lake Engines Apache Spark Subsurface LIVE Sessions

Tracking & Triggering Pattern with Spark Stateful Streaming

Using Apache Spark Stateful Streaming to create services that minimize processing time while keeping everything under defined SLAs.
Read more
Reverse ETL: The Last Mile in Operationalizing the Data Lake
Subsurface LIVE Sessions

Reverse ETL: The Last Mile in Operationalizing the Data Lake

Reverse ETL syncs transformed data from your data lake back into operational systems. Learn more about what it is and why it’s taking off.
Read more
The Age of Big Data is Over. Enter Data Activation
Subsurface LIVE Sessions

The Age of Big Data is Over. Enter Data Activation

You have big data: now what? The real complexity is from the number of systems and stakeholders that interact with data, not the volume. Learn how to manage this complexity and activate your data using your data lakehouse or warehouse.
Read more
Leveraging DataOps to Build India’s National Data Platform
Subsurface LIVE Sessions

Leveraging DataOps to Build India’s National Data Platform

How an eight-member team built India’s National Data Platform, one of the largest public sector data lake projects, in under 12 months.
Read more
The Write-Audit-Publish Pattern via Apache Iceberg
Table Formats Apache Iceberg Subsurface LIVE Sessions

The Write-Audit-Publish Pattern via Apache Iceberg

How the Write-Audit-Publish pattern, enabled through Apache Iceberg, works to ensure data is correct at massive scale.
Read more
Predicting TV Tune-In Using PySpark, MLlib & Delta Lakehouse
Subsurface LIVE Sessions

Predicting TV Tune-In Using PySpark, MLlib & Delta Lakehouse

How MIQ Digital India Pvt. Ltd. scales its high-volume, TV-viewing data product to market and optimizes data pipelines.
Read more
Super App Introduction
Subsurface LIVE Sessions

Super App Introduction

The session will talk about Super App use cases and why the Super App is the prevalent phenomenon in Asia.
Read more
Streaming from an Iceberg Data Lake
Subsurface LIVE Sessions

Streaming from an Iceberg Data Lake

Learn how a Flink Iceberg source enables streaming reads from Iceberg tables.
Read more
Do We Still Need People to Write Database Systems?
Subsurface LIVE Sessions

Do We Still Need People to Write Database Systems?

A look at the trend toward replacing traditional, hand-optimized DBMS components with "learned" components that rely on machine learning.
Read more
Build analytics apps on lakes and streams with Apache Druid
Subsurface LIVE Sessions

Build analytics apps on lakes and streams with Apache Druid

Data lakes AND streams: learn how Apache Druid can extend your application to get interactive, high concurrency insights from both.
Read more
Boost Performance with Intel AVX-512 and Java
Subsurface LIVE Sessions

Boost Performance with Intel AVX-512 and Java

With Intel AVX-512, Java applications can process twice as much data at a time compared to AVX2. Learn how Dremio can benefit from AVX-512.
Read more
Auditing Your Data
Subsurface LIVE Sessions

Auditing Your Data

The design process behind Nielsen’s Data Auditing system, Life Line, and answering the lifelong question, is it the end of the day yet?
Read more
Apache Arrow: Open Source Standard Becomes Enterprise Necessity
In-Memory Formats Apache Arrow Subsurface LIVE Sessions

Apache Arrow: Open Source Standard Becomes Enterprise Necessity

Learn how the Apache Arrow ecosystem has evolved over the last year to become the de facto standard for bridging data science.
Read more
Founder’s Panel
Subsurface LIVE Sessions

Founder’s Panel

Hear from founders and open source pioneers on how the data community is tackling some of the biggest challenges in data architecture.
Read more
Reclaiming Your Focus and Avoiding Burnout
Subsurface LIVE Sessions

Reclaiming Your Focus and Avoiding Burnout

In this closing keynote, “Deep Work” author Cal Newport shares actionable advice on how to achieve heightened focus and avoid burnout.
Read more
Achieve Proactive Data Observability for Your Lakehouse
Subsurface LIVE Sessions

Achieve Proactive Data Observability for Your Lakehouse

Data observability is the foundational technology that enables data execs to align dataops investments with business priorities
Read more
From DBA & Open Source Contributor to CTO & Co-Founder
Subsurface LIVE Sessions

From DBA & Open Source Contributor to CTO & Co-Founder

Juan Pan shares her empowering journey of going from a woman developer in the open source world to CTO at an open source commercial startup.
Read more
What Can Iceberg Do for You?
Table Formats Apache Iceberg Subsurface LIVE Sessions

What Can Iceberg Do for You?

The history, benefits, and possibilities of Apache Iceberg in the discipline of BI and analytics.
Read more
Douglas: When E-Commerce Explodes – The More Data the More Dremio
Subsurface LIVE Sessions

Douglas: When E-Commerce Explodes – The More Data the More Dremio

Learn how Douglas is using a cloud-based data lake and Dremio to accelerate the delivery of business-critical information for marketing.
Read more
Modernizing Finance Data Cloud Infrastructure at Fannie Mae
Subsurface LIVE Sessions

Modernizing Finance Data Cloud Infrastructure at Fannie Mae

An overview of the largest AWS build in Fannie Mae, centralizing finance data and delivering core calculations to improve analytics and more.
Read more
Why Your ETL Should Be Open-Source
Subsurface LIVE Sessions

Why Your ETL Should Be Open-Source

In this talk, we will describe the benefits to the open-source ETL approach.
Read more
What’s the Big Deal about Data Observability
Subsurface LIVE Sessions

What’s the Big Deal about Data Observability

Data observability is the foundational technology that enables data execs to align dataops investments with business priorities
Read more
How the Lakehouse Evolved & Why It’s the Future of Analytics
Subsurface LIVE Sessions

How the Lakehouse Evolved & Why It’s the Future of Analytics

Bill Inmon, “The Father of the Data Warehouse,” discusses the evolution to the lakehouse and why it’s the future of analytics.
Read more
Day 1: Welcome to Subsurface
Subsurface LIVE Sessions

Day 1: Welcome to Subsurface

Read more
Mercedes Benz R&D – The Best or Nothing Data Platform
Subsurface LIVE Sessions

Mercedes Benz R&D – The Best or Nothing Data Platform

How the Data Platform team at Mercedes-Benz securely democratizes data access across teams and sources with a unified data platform.
Read more
Cross-Platform Lineage with OpenLineage
Subsurface LIVE Sessions

Cross-Platform Lineage with OpenLineage

Data today is distributed and heterogeneous. Data lineage helps by tracing the relationship between datasets and placing them in context.
Read more
Women in Data Panel
Subsurface LIVE Sessions

Women in Data Panel

Women in Data Panel
Read more
Tuning Row-Level Operations in Apache Iceberg
Table Formats Apache Iceberg Subsurface LIVE Sessions

Tuning Row-Level Operations in Apache Iceberg

Deep dive into copy-on-write and merge-on-read approaches for executing row-level operations in Apache Iceberg.
Read more
Get Hands-On with a Dremio Cloud Workshop
Subsurface LIVE Sessions

Get Hands-On with a Dremio Cloud Workshop

Learn how Apache Iceberg’s RewriteDatafile ensures any table can be kept at peak performance regardless of ingestion patterns or table size.
Read more
An Open Data Architecture in Action with Apache Iceberg
Subsurface LIVE Sessions

An Open Data Architecture in Action with Apache Iceberg

Learn how Apache Iceberg’s RewriteDatafile ensures any table can be kept at peak performance regardless of ingestion patterns or table size.
Read more
Beyond Linear Notebooks: Implementing Reactivity with IPython
Subsurface LIVE Sessions

Beyond Linear Notebooks: Implementing Reactivity with IPython

Pros and cons of the traditional IPython execution model versus a reactive model, and dive into our implementation of reactive notebooks under the hood.
Read more
Operational Analytics vs BI: A new world of data
Subsurface LIVE Sessions

Operational Analytics vs BI: A new world of data

BI generally means hours of extra analysis and frustration around data integration. There’s a better way: Operational Analytics.
Read more
How to Build an IoT Data Lake
Subsurface LIVE Sessions

How to Build an IoT Data Lake

This talk discusses both the challenges and solutions of using data lakes for long-term storage of IoT data.
Read more
Build Data Lake Pipelines at Scale – Using only SQL
Subsurface LIVE Sessions

Build Data Lake Pipelines at Scale – Using only SQL

Learn how to build and maintain pipelines that run on cloud data lakes, using SQL and automation of ALL pipeline ops (orchestration, etc).
Read more
dbt Alerting for Real – Time Data Teams
Subsurface LIVE Sessions

dbt Alerting for Real – Time Data Teams

How ZOE implemented real-time dbt alerting to notify data engineering teams of issues in their pipelines and resolve issues faster.
Read more
Real – Time Hybrid Cloud Data Streaming
Subsurface LIVE Sessions

Real – Time Hybrid Cloud Data Streaming

How Wayfair built a highly scalable cloud-native streaming platform with Apache Beam, Google Cloud Platform, and Dataflow.
Read more
Headless BI Meets Data Source Managers
Business Intelligence Subsurface LIVE Sessions

Headless BI Meets Data Source Managers

GoodData & Dremio: real-time integration in action with multiple underlying data sources. Live demo included.
Read more
Arrow FlightSQL: A 20x Faster Alternative to JDBC and ODBC
In-Memory Formats Apache Arrow Subsurface LIVE Sessions

Arrow FlightSQL: A 20x Faster Alternative to JDBC and ODBC

Learn the advantages of Apache Arrow FlightSQL over JDBC and ODBC and how to use FlightSQL from Python, C++, Java, and other languages.
Read more
Managing Data Files in Apache Iceberg
Table Formats Apache Iceberg Subsurface LIVE Sessions

Managing Data Files in Apache Iceberg

Learn how Apache Iceberg’s RewriteDatafile ensures any table can be kept at peak performance regardless of ingestion patterns or table size.
Read more