On Thursday, April 28, the Subsurface Community held their first meetup, which was a virtual/in-person hybrid event. Two speakers presented virtually from Asurion and Dremio, and the audience had the ability to experience the event either online from anywhere in the world or in person at locations in New York and Chicago.

In-person attendees were greeted with a complimentary Apache Iceberg T-shirt, free food and drink, and ping pong at SPIN, an iconic ping pong social club located in Chicago and NY.

Below you’ll find descriptions of the two talks given at the event. Make sure to not miss the next Subsurface meetup, virtually or in person when it’s in your neighborhood, by joining the Subsurface Community meetup group.

DPS (Data Positioning System): The GPS for Your Data Lake 

Rajesh Gundugollu, Principal Data Architect, Asurion

This presentation is about our internal product called DPS. We do not call it a data catalog intentionally because it’s much more than a data catalog. It gives users and platform owners everything they need to know about the data in the data platform all in one place via a simple search-driven UI.

We brought together data assets, columns, data movement jobs, users, infrastructure, operational data, and even documentation into one pane of glass. All of this is presented via a very simplified, interactive, and easy-to-understand interface. Lots of information about data assets like lineage, impact analysis, operational metrics, quality metrics, and regulatory metrics come together in one place.

This presentation also shares how we overcame a metadata culture hurdle, how we built this ourselves, and how we innovated using a graph-type data model without a graph database.

dps_subsurface_2022_04

Intro to Apache Iceberg Views 

– Eduard Tudenhoefner, OSS Developer, Dremio

In open architectures, different engines are used for the workload they were designed and work best for. When using multiple different engines on the same datasets, they all need to agree on what the dataset is. Apache Iceberg provides that capability, and it works well when you primarily have one engine doing the writing and one engine doing the downstream analytics. However, when using multiple engines for downstream user-facing analytics, each engine also needs to use business logic to provide the end user the answers they're looking for.

When using multiple engines for downstream analytics, there are generally three options:

  1. Each engine has their own definition of the business logic on top of the shared tables.
  2. Route other engines’ access through a single engine, which technologies like Apache Arrow Flight make more feasible.
  3. Centrally define the business logic in a way all engines can make use of. This has generally not been possible for the vast majority of organizations in the past. This is the approach Apache Iceberg views aim to enable.

This talk provides an introduction to Iceberg views and how they can be useful to you.

Iceberg-Views

Watch the Talks

The Subsurface Community aims to bring you the latest on the technology and best practices around open lakehouse platforms. We hope you can join us for future events to learn from your peers’ experiences and have the opportunity to meet and spend quality time with others in the industry. Join the Subsurface Meetup community today!