March 1, 2023
10:10 am - 10:40 am PST
How a Data Lakehouse Architecture and Dremio are Addressing Memorial Sloan Kettering’s Data Challenges in Medical Research
Memorial Sloan Kettering (MSK) is one of the nation’s top cancer care and research centers. As such, MSK faces many of the usual data challenges in the medical research space, such as maintaining patient privacy, siloed data, data democratization across a diverse skillset, and overlapping data requirements from different projects resulting in many copies of data (e.g., multiple versions of tables, data sharing via emailed spreadsheets). In the past, MSK’s data infrastructure posed many challenges, like delays in answering even simple questions like what data was available in the system, difficulty tracking the version of the data copies, and difficulty adhering to governance requirements for the data.
Dremio and the lakehouse architecture has begun to help us address these challenges. Some of its benefits include easy sharing with access controls to eliminate emailing of spreadsheets, data provenance, high performance analytics, and ultimately effective data democratization.
In this talk, we’ll dive into the challenges we’ve faced, how Dremio and the lakehouse architecture have helped us address these challenges. We will also highlight a use case for obtaining patient and demographics counts across many ongoing studies for IRB review. This task previously took days and is now reduced to minutes.