The Open Lakehouse Community

Featured Subsurface Community Content

Apache Iceberg and the Right to Be Forgotten

Apache Iceberg and the Right to Be Forgotten

Time travel capabilities and privacy laws like GDPR and CCPA are at odds with each other. Here’s how to make sure you’re GDPR/CCPA compliant while using time travel in Apache Iceberg.
Read more
Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue

Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue

Learn how to ingest streaming data from AWS Kinesis into Apache Iceberg Tables using AWS Glue, and then query it with Dremio.
Read more
Ensuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout

Ensuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout

Object Storage can have some potential bottlenecks when it comes to working with big data. Apache Iceberg’s architecture lends to overcoming these challenges for a scalable table format solution for object storage.
Read more
Introduction to Apache Iceberg Using Spark

Introduction to Apache Iceberg Using Spark

Learn the basics of Iceberg’s many features and utilities by trying them out in a Spark sandbox.
Read more

Save the Date for Subsurface LIVE 2023!

Apache Iceberg and the Right to Be Forgotten

Apache Iceberg and the Right to Be Forgotten

Time travel capabilities and privacy laws like GDPR and CCPA are at odds with each other. Here’s how to make sure you’re GDPR/CCPA compliant while using time travel in Apache Iceberg.
Read more
Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue

Streaming Data into Apache Iceberg Tables Using AWS Kinesis and AWS Glue

Learn how to ingest streaming data from AWS Kinesis into Apache Iceberg Tables using AWS Glue, and then query it with Dremio.
Read more
Ensuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout

Ensuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout

Object Storage can have some potential bottlenecks when it comes to working with big data. Apache Iceberg’s architecture lends to overcoming these challenges for a scalable table format solution for object storage.
Read more
Introduction to Apache Iceberg Using Spark

Introduction to Apache Iceberg Using Spark

Learn the basics of Iceberg’s many features and utilities by trying them out in a Spark sandbox.
Read more
A Hands-On Look at the Structure of an Apache Iceberg Table

A Hands-On Look at the Structure of an Apache Iceberg Table

This tutorial provides a practical deep dive into the internals of Apache Iceberg using Dremio Sonar as the engine.
Read more
Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg

Row-Level Changes on the Lakehouse: Copy-On-Write vs. Merge-On-Read in Apache Iceberg

How copy-on-write and merge-on-read work in Apache Iceberg.
Read more
Meetup: Comparison of Data Lakehouse Table Formats

Meetup: Comparison of Data Lakehouse Table Formats

This presentation covers the three major data lake table formats – Apache Iceberg, Apache Hudi, and Delta Lake – how they work, their features, and their limitations so you can make an informed decision when architecting your data lakehouse.
Read more
Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Migrating a Hive Table to an Iceberg Table Hands-on Tutorial

Learn how to migrate your existing Hive tables into Apache Iceberg tables to take full advantage of features like Version Rollback, Partition Evolution and more.
Read more
Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Table Format Governance and Community Contributions: Apache Iceberg, Apache Hudi, and Delta Lake

Learn about the differences in the governance and communities behind open source table formats like Apache Iceberg, Apache Hudi, and Delta Lake.
Read more
Meetup: Subsurface Talks with Asurion and Dremio

Meetup: Subsurface Talks with Asurion and Dremio

Two great Subsurface Talks. Asurian shares how they overcame the Metadata culture to hurdle to build DPS and innovated using graph type data model without a graph database. And, Dremio provides an introduction to Apache Iceberg views and how they can be useful to you.
Read more
Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Fewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning

Learn about hidden partitioning and why it is such a valuable feature of Apache Iceberg tables.
Read more
How to Migrate a Hive Table to an Iceberg Table

How to Migrate a Hive Table to an Iceberg Table

Learn how to architect a migration from your existing Hive tables into Apache Iceberg tables to take full advantage of features like Version Rollback, Partition Evolution and more.
Read more
Maintaining Iceberg Tables – Compaction, Expiring Snapshots, and More

Maintaining Iceberg Tables – Compaction, Expiring Snapshots, and More

Learn about the strategies and best practices for maintaining Apache Iceberg tables.
Read more
That’s a Wrap! Highlights from Subsurface LIVE Winter 2022

That’s a Wrap! Highlights from Subsurface LIVE Winter 2022

With Subsurface LIVE Winter 2022 over, we reflect on the event with stats, highlights, and look forward to what’s next for Subsurface.
Read more
Tracking & Triggering Pattern with Spark Stateful Streaming
Data Lake Engines Apache Spark Subsurface LIVE Sessions

Tracking & Triggering Pattern with Spark Stateful Streaming

Using Apache Spark Stateful Streaming to create services that minimize processing time while keeping everything under defined SLAs.
Read more
Reverse ETL: The Last Mile in Operationalizing the Data Lake
Subsurface LIVE Sessions

Reverse ETL: The Last Mile in Operationalizing the Data Lake

Reverse ETL syncs transformed data from your data lake back into operational systems. Learn more about what it is and why it’s taking off.
Read more
How HyreCar Maximizes the Power of Self-Serve Analytics
Subsurface LIVE Sessions

How HyreCar Maximizes the Power of Self-Serve Analytics

Learn how HyreCar successfully deployed self-service analytics by leveraging open data architecture.
Read more
The Age of Big Data is Over. Enter Data Activation
Subsurface LIVE Sessions

The Age of Big Data is Over. Enter Data Activation

You have big data: now what? The real complexity is from the number of systems and stakeholders that interact with data, not the volume. Learn how to manage this complexity and activate your data using your data lakehouse or warehouse.
Read more
Leveraging DataOps to Build India’s National Data Platform
Subsurface LIVE Sessions

Leveraging DataOps to Build India’s National Data Platform

How an eight-member team built India’s National Data Platform, one of the largest public sector data lake projects, in under 12 months.
Read more
The Write-Audit-Publish Pattern via Apache Iceberg
Table Formats Apache Iceberg Subsurface LIVE Sessions

The Write-Audit-Publish Pattern via Apache Iceberg

How the Write-Audit-Publish pattern, enabled through Apache Iceberg, works to ensure data is correct at massive scale.
Read more
Predicting TV Tune-In Using PySpark, MLlib & Delta Lakehouse
Subsurface LIVE Sessions

Predicting TV Tune-In Using PySpark, MLlib & Delta Lakehouse

How MIQ Digital India Pvt. Ltd. scales its high-volume, TV-viewing data product to market and optimizes data pipelines.
Read more
Super App Introduction
Subsurface LIVE Sessions

Super App Introduction

The session will talk about Super App use cases and why the Super App is the prevalent phenomenon in Asia.
Read more
Streaming from an Iceberg Data Lake
Subsurface LIVE Sessions

Streaming from an Iceberg Data Lake

Learn how a Flink Iceberg source enables streaming reads from Iceberg tables.
Read more
Do We Still Need People to Write Database Systems?
Subsurface LIVE Sessions

Do We Still Need People to Write Database Systems?

A look at the trend toward replacing traditional, hand-optimized DBMS components with "learned" components that rely on machine learning.
Read more
Build analytics apps on lakes and streams with Apache Druid
Subsurface LIVE Sessions

Build analytics apps on lakes and streams with Apache Druid

Data lakes AND streams: learn how Apache Druid can extend your application to get interactive, high concurrency insights from both.
Read more
Boost Performance with Intel AVX-512 and Java
Subsurface LIVE Sessions

Boost Performance with Intel AVX-512 and Java

With Intel AVX-512, Java applications can process twice as much data at a time compared to AVX2. Learn how Dremio can benefit from AVX-512.
Read more
Auditing Your Data
Subsurface LIVE Sessions

Auditing Your Data

The design process behind Nielsen’s Data Auditing system, Life Line, and answering the lifelong question, is it the end of the day yet?
Read more
Apache Arrow: Open Source Standard Becomes Enterprise Necessity
In-Memory Formats Apache Arrow Subsurface LIVE Sessions

Apache Arrow: Open Source Standard Becomes Enterprise Necessity

Learn how the Apache Arrow ecosystem has evolved over the last year to become the de facto standard for bridging data science.
Read more
Founder’s Panel
Subsurface LIVE Sessions

Founder’s Panel

Hear from founders and open source pioneers on how the data community is tackling some of the biggest challenges in data architecture.
Read more
Reclaiming Your Focus and Avoiding Burnout
Subsurface LIVE Sessions

Reclaiming Your Focus and Avoiding Burnout

In this closing keynote, “Deep Work” author Cal Newport shares actionable advice on how to achieve heightened focus and avoid burnout.
Read more
Achieve Proactive Data Observability for Your Lakehouse
Subsurface LIVE Sessions

Achieve Proactive Data Observability for Your Lakehouse

Data observability is the foundational technology that enables data execs to align dataops investments with business priorities
Read more
From DBA & Open Source Contributor to CTO & Co-Founder
Subsurface LIVE Sessions

From DBA & Open Source Contributor to CTO & Co-Founder

Juan Pan shares her empowering journey of going from a woman developer in the open source world to CTO at an open source commercial startup.
Read more
What Can Iceberg Do for You?
Table Formats Apache Iceberg Subsurface LIVE Sessions

What Can Iceberg Do for You?

The history, benefits, and possibilities of Apache Iceberg in the discipline of BI and analytics.
Read more
Douglas: When E-Commerce Explodes – The More Data the More Dremio
Subsurface LIVE Sessions

Douglas: When E-Commerce Explodes – The More Data the More Dremio

Learn how Douglas is using a cloud-based data lake and Dremio to accelerate the delivery of business-critical information for marketing.
Read more
Modernizing Finance Data Cloud Infrastructure at Fannie Mae
Subsurface LIVE Sessions

Modernizing Finance Data Cloud Infrastructure at Fannie Mae

An overview of the largest AWS build in Fannie Mae, centralizing finance data and delivering core calculations to improve analytics and more.
Read more
Why Your ETL Should Be Open-Source
Subsurface LIVE Sessions

Why Your ETL Should Be Open-Source

In this talk, we will describe the benefits to the open-source ETL approach.
Read more
What’s the Big Deal about Data Observability
Subsurface LIVE Sessions

What’s the Big Deal about Data Observability

Data observability is the foundational technology that enables data execs to align dataops investments with business priorities
Read more