On Thursday, April 28, the Subsurface Community held their first meetup, which was a virtual/in-person hybrid event. Two speakers presented virtually from Asurion and Dremio, and the audience had the ability to experience the event either online from anywhere in the world or in person at locations in New York and Chicago.
In-person attendees were greeted with a complimentary Apache Iceberg T-shirt, free food and drink, and ping pong at SPIN, an iconic ping pong social club located in Chicago and NY.
Below you’ll find descriptions of the two talks given at the event. Make sure to not miss the next Subsurface meetup, virtually or in person when it’s in your neighborhood, by joining the Subsurface Community meetup group.
This presentation is about our internal product called DPS. We do not call it a data catalog intentionally because it’s much more than a data catalog. It gives users and platform owners everything they need to know about the data in the data platform – all in one place via a simple search-driven UI.
We brought together data assets, columns, data movement jobs, users, infrastructure, operational data, and even documentation into one pane of glass. All of this is presented via a very simplified, interactive, and easy-to-understand interface. Lots of information about data assets like lineage, impact analysis, operational metrics, quality metrics, and regulatory metrics come together in one place.
This presentation also shares how we overcame a metadata culture hurdle, how we built this ourselves, and how we innovated using a graph-type data model without a graph database.dps_subsurface_2022_04
In open architectures, different engines are used for the workload they were designed and work best for. When using multiple different engines on the same datasets, they all need to agree on what the dataset is. Apache Iceberg provides that capability, and it works well when you primarily have one engine doing the writing and one engine doing the downstream analytics. However, when using multiple engines for downstream user-facing analytics, each engine also needs to use business logic to provide the end user the answers they're looking for.
When using multiple engines for downstream analytics, there are generally three options:
This talk provides an introduction to Iceberg views and how they can be useful to you.Iceberg-Views
The Subsurface Community aims to bring you the latest on the technology and best practices around open lakehouse platforms. We hope you can join us for future events to learn from your peers’ experiences and have the opportunity to meet and spend quality time with others in the industry. Join the Subsurface Meetup community today!
Alex Merced is a Developer Advocate for Dremio with a history of creating content to enable developers of all types through his personal projects like DevNursery.com, The Web Dev 101 Podcast, and the DataNation podcast. Alex Merced has been a developer with companies like Crossfield Digital, CampusGuard, GenEd Systems and others along with being an Instructor for General Assembly Bootcamps.