5 minute read · June 23, 2022

Announcing the First Book Dedicated to Apache Arrow

Matt Topol

Matt Topol · Staff Software Engineer, Voltron Data

Jason Hughes

Jason Hughes · Director of Technical Advocacy, Dremio

As co-creators of Apache Arrow, here at Dremio it’s been really exciting over the past several years to see its incredible growth, bringing more usage, ecosystem adoption, capabilities, and people to the project. 

As with most new technology, it can be hard for users to find content that starts with the foundational basics and then builds upon those basics to get progressively deeper into the areas needed to understand and effectively use the technology. 

That’s why today we’re excited to announce the launch of a book that accomplishes just that, In-Memory Analytics with Apache Arrow by Matt Topol, principal software architect at FactSet and a committer on the Apache Arrow project. 

Matt works with Arrow extensively in his architect role at FactSet, applying Arrow together with other technologies to solve business problems at FactSet. He also shares his experience and knowledge with the community, including speaking at Subsurface and participating in community podcasts. He also recently gave an interview about the book.

The book begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrow's versatility and benefits as you walk through a variety of real-world use cases. 

The book is flexible and addresses the needs of multiple audiences: data scientists, data engineers, data-driven application developers, and anyone working with large datasets in any significant way.

Some examples of book’s topics include:

  • The Apache Arrow format
  • Arrow usage in C++, Python, and Go
  • The Arrow Compute and Dataset APIs
  • Integrations with pandas and Spark
  • Arrow Flight and Arrow Flight SQL with examples
  • Real-world use cases and applications, including lots of sample code in the book that’s also available in a GitHub repository
  • How to get involved and make your first PR (pull request) to contribute to the Arrow project

Dremio’s been deeply involved with Matt and Packt, the book’s publisher, in the time leading up to today’s launch, so we’ve had the chance to read the book, as well as provide feedback and input, and we think it’s fantastic. The content is technically solid and offers a good balance of concepts, use cases, and technical details. The book has also received lots of positive feedback from others in the field, engaging readers using a lighthearted tone, puns, and jokes, resulting in a more approachable read. 

We’re excited that there’s now a great resource to learn Arrow, whether you’re new to Arrow or already work with Arrow (either directly via tools like pyarrow or indirectly via engines powered by Arrow like Dremio) but want to deepen and solidify your understanding.

You can order the book on Amazon today. 

Share this blog on Twitter and mention @Dremio and @ApacheArrow in your tweet to enter a drawing to be 1 of 3 winners of a free physical copy of the book! Contest ends July 8, 2022.

Give it a read and feel free to reach out with any feedback!

Legal disclaimer regarding the giveaway:

Should items of value (e.g. food, promotional items) be disbursed to participants, these items will be available at no charge to attendees, which you may refuse. Please verify that you are not restricted from receiving items of value (such as meals, promotional items, etc.), including under your employer’s policies. SPECIAL ATTENTION ALL PUBLIC SECTOR REPRESENTATIVES – Federal (including Military), State, Local, and Public Education employees. Check with your ethics policies and your relevant guidelines before accepting items of value.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.