Watch the Subsurface LIVE Winter 2021 Sessions On Demand
The New Data Tier
Tomer Shiran, Founder and CPO at Dremio presents the evolution of cloud data lakes and the separation of compute and data.
Analytics for Everyone - Unlocking the Full Power of All Your Data
Francois Ajenstat, CPO at Tableau in a Virtual Fireside Chat session on key market learnings, analytics market and analytics trend in years to come.
When People Are the Problem - Obstacles to a Data-Driven Culture
Billy Bosworth, CEO at Dremio discusses a career’s worth of insights into why some companies get it right, and others get it wrong because of common obstacles.
Data Lakes Drive Decisions: A Virtual Fireside Chat
Mai-Lan Tomsen Bukovec, Global Vice President, at AWS discuss emerging trends in data lakes and how they are powering the next generation applications.
Apache Iceberg: What's New
Ryan Blue, Software Engineer, Data Platform at Netflix present the latest features and updates on Apache Iceberg technology.
Implementing a Data Mesh Architecture at JPMC
This session from JPMC discuss JPMC’s data lake via data mesh architecture and the wholesale credit risk use case for data lake via data mesh.
Flexible Data Lake Architectures for Seamless Real-time Data and Machine Learning Integrations
This talk will go through various design problems spawned from integrations and solutions used at GFT and showcase those use cases in data lake architecture.
High Frequency Small Files vs. Slow Moving Datasets
In this presentation, we will review the design of Flux, its place in AEP's data lake, the challenges we had in operationalizing it and the final results.
Iceberg at Adobe: Challenges, Lessons & Achievements
We were on our second iteration of Adobe Experience Platform's (AEP) data lake when Apache Iceberg first came up on our radar.
Introducing InfluxDB IOx, a Federated In-Memory Columnar Store Backed by Object Storage
Paul Dix, CTO at InfluxData introduces InfluxDB IOx, the future open source core of the InfluxDB time series database.
Data Lineage with Apache Airflow
This talk demonstrates how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs, and minimize data quality issues.
A Git-Like Experience for Data Lakes
Project Nessie, decouples transactions and makes distributed transactions real using table formats capabilities to provide Git-like semantics for data lakes.
Designing Performant, Scalable, and Secure Data Lakes
Rukmani Gopalan, PM at Microsoft presented the do's and don'ts of building enterprise data lakes including patterns, pipeline, organization and security.
Arrow Flight and Flight SQL: Accelerating Data Movement
This talk will look at Apache Arrow Flight and how it plays a key role in accelerating the movement of data in modern data architectures.
Power BI Best Practices for Working with Big Data
In this session you’ll learn how to choose between Import mode and DirectQuery mode for your dataset, find out how composite models and aggregations can help.
Analytics Engineering in Data Lakes with dbt
This talk present an analytics engineering toolset, dbt, is a natural fit to enable the intuitive workflows of data warehousing in the cloud data lake.
Serverless Cloud Data Lake with Spark for Serving Weather Data
This session presents TWC’s architecture with serverless cloud data lake on top of Apache Spark and how that enables highly elastic and economic data serving.
GOing Native with Arrow Flight and Dremio
This sessions describes how and why FactSet pursued native Golang connectors to Dremio, and native Golang Apache Arrow Flight server and client implementation.
From Discovering Data to Trusting Data
Mark Grover, Amundsen Creator presents overview of Amundsen and detail of both automated and curated metadata to show trusted and non-trusted data in Amundsen.
Effectively Cataloging Data Lakes with Amundsen and Dremio
This session will introduce the need for cataloging, an overview of Amundsen, integration with Dremio and conclude with a demo.
Scaling Data Access and Governance on Data Lakes: Challenges and Common Approaches
This talk discuss the challenges that teams face when trying to secure their data lake access as well as common approaches and trade-offs.
Data Observability for Data Lakes: The Next Frontier of Data Engineering
This talk will introduce the concept of “data downtime” and how to eliminate it in your data lake, as well as the rest of your data ecosystem.
Deep Dive into Iceberg SQL Extensions
This talk will focus on the Iceberg SQL extensions, a recent development in the Iceberg community to efficiently manage tables through SQL.
Migrating to Parquet - The Veraset Story
Veraset is a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries.
Plan, Design and Build a Successful Data Lake on AWS
This talk share how to best plan, design and build a successful data lake that will scale and evolve as your business needs and demands increase.
The 2021 State of Data Operations: Emerging Challenges in Expanding Cloud Data Ecosystems
Current and future state of data access governance and sensitive data analytics, and how to avoid challenges when creating data governance strategy.
High-Performance Big Data Analytics Processing Using Hardware Acceleration
This talk will give an introduction to FPGAs and discuss their advantages and challenges in the context of big data analytics.
5 Lessons Learnt from Building Context Aware Smart Data Lake
This session features five lessons learned from building a context-aware smart data lake.
Enabling Real-Time Analytics for Data Lakes with Apache Ignite
In this session, Denis Magda share how to improve our analytical operations and queries with in-memory computing, with in-memory systems such as Apache Ignite.
Centralized Security and Governance in the Cloud
This talk will highlight new capabilities of centralized access control and how it can be used to provide robust security and governance.