IMG 6043

On August 11th, 2022 the Subsurface Data Lakehouse community put on a meetup with two exciting talks about the exciting open-source technology, Apache Arrow. Make sure to join the meetup group to not miss upcoming meetup events.

'Understanding Apache Arrow' with Voltron’s Matt Topol

This talk given by the author of the first book written on Apache Arrow, covers precisely what Apache Arrow is, why you should use it, and what use cases make the most sense for it. Learn what use cases and situations make sense to use Arrow vs message passing formats like Protobuf or JSON, as well as when it makes sense to use Arrow vs storage formats like Apache Parquet, Apache ORC or CSV. Efficiently structuring and properly managing your memory is key to performant processing, and right out-of-the-box Arrow has various tools to get you there.

Subsurface-Meetup.pptx

'Apache Arrow Flight SQL: a universal standard for high-performance data transfers from databases' by Dremio’s Jason Hughes

Jason Hughes, who served as a technical reviewer of Matt’s book, covers why ODBC & JDBC don’t cut it in today’s data world and the problems solved by Arrow, Arrow Flight, and Arrow Flight SQL. We’ll go through how each of these building blocks works as well as an overview of universal ODBC & JDBC drivers built on Arrow Flight SQL, enabling clients to take advantage of this increased performance with zero application changes

Apache-Arrow-Flight-SQL_-a-universal-standard-for-high-performance-data-transfers-from-databases