Panel: Open Data Architecture

The Open Data Architecture panel closes Subsurface LIVE Summer 2021 with a lively discussion about the state of the cloud data lake with some of the most influential creators and contributors to key open source data lake software. Moderator Gartner analyst Sanjeev Mohan opens the panel with highlights of recent cloud data lake industry trends. Then he asks the open source panelists questions such as:

Why data lakes have not, until now, succeeded in having the fast turnaround of data that was expected;
What missteps have there been along the way and what lessons have we learned from them;
What each contributors journey has been, and why they have succeeded.

Topics Covered

Apache Arrow Flight

Dremio Subsurface for Apache Arrow

Dremio Subsurface for Apache Parquet

In-Memory Formats

Metastores

Subsurface: Nessie Project Insights

Table Formats

Unlocking Potential with Apache Iceberg

Speakers

Sanjeev Mohan

Mohan Sanjeev is an established thought leader in the areas of cloud, big data and analytics. He researches and advises on changing trends and technologies in the modern cloud data architectures. He started his data and analytics journey at Oracle where he worked on emerging technologies. Until recently, he was a Gartner vice president known for his prolific and detailed research, and for directing the data and analytics agenda. Now a Principal at SanjMo, he provides advisory and consulting services, covering modern data architectures, governance and operations. He regularly presents on topics pertaining to end-to-end data pipelines and is excited to help businesses discover what their data can do for them.

Ryan Blue

Ryan Blue is the co-creator of Apache Iceberg, and he works on open source data infrastructure. He is also an Avro, Parquet, and Spark committer.

Ryan Murray

Ryan Murray is an open source Engineering Lead at Dremio. He previously served in the financial services industry doing everything from bond trader to data engineering lead. Ryan holds a PhD in theoretical physics and is an active open source contributor who dislikes it when data isn’t accessible in an organisation. He is passionate about making customers successful and self-sufficient, and still one day dreams of winning the Stanley Cup.

Julien Le Dem

Julien Le Dem is the Chief Architect of Astronomer and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including OpenLineage, Marquez (LFAI&Data), Apache Arrow, Apache Iceberg, and others. Previously, he was a senior principal at WeWork, a principal architect at Dremio, a tech lead for Twitter’s data processing tools, where he obtained a two-character Twitter handle (@J_), and a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.

Wes McKinney

Wes McKinney is a software developer and entrepreneur focusing on analytical computing. He created the Python pandas project and is a co-creator of Apache Arrow. He authored two editions of the reference book, Python for Data Analysis. Wes is a member of The Apache Software Foundation and also a PMC member for Apache Parquet. He is now the CTO and co-founder of Voltron Data, a new startup working on accelerated computing technologies powered by Apache Arrow.

Panel: Open Data Architecture

Speakers

Ready to Get Started? Here Are Some Resources to Help

Whitepaper

Dremio Upgrade Testing Framework

Whitepaper

Operating Dremio Cloud Runbook

Webinars

Unlock the Power of a Data Lakehouse with Dremio Cloud

Get Started Free

See Dremio in Action

Talk to an Expert

Ready to Get Started?