Dremio Jekyll

Java Vector Enhancements for Apache Arrow 0.8.0

In a recent blog post I wrote about enhancement to vectorized processing in Apache Arrow 0.8.0 in the Java code base. In this post we will contrast latency figures for Arrow 0.7.0 and 0.8.0 to explore how these enhancements have helped to improve the overall performance of Arrow. Note that these changes are specific to the Java implementation.

For Arrow 0.8.0, we made significant enhancements to the Java vector implementation by improving the performance of critical code paths and reducing heap usage. In addition, these changes lay the foundation for additional efforts in the future to further reduce heap usage in vectors.

In the table below we compare PROJECT and FILTER operations of specific queries from the TPC-H benchmark using Dremio. Once these code changes were committed to Arrow, we changed our code in Dremio to move to the new implementation of Java vectors and ran a subset of the standard TPC-H performance tests using a 10 node cluster on GCP.

TPC-H Query Project (Arrow 0.8) Project (Arrow 0.7) Project (% improvement) Filter (Arrow 0.8) Filter (Arrow 0.7) Filter (% improvement)
Q1 101ms 120ms 15.8% N/A N/A N/A
Q3 0.2ms 0.5ms 60% N/A N/A N/A
Q4 3ms 3ms 0% 162ms 250ms 35.2%
Q5 23ms 38ms 39.4% N/A N/A N/A
Q6 14ms 18ms 22.2% 42ms 50ms 16.0%
Q7 16ms 25ms 36.0% 87ms 150ms 42.0%
Q8 14ms 22ms 36.4% N/A N/A N/A
Q9 22ms 34ms 35.3% 675ms 720ms 6.3%
Q10 15ms 20ms 25.0% N/A N/A N/A
Q11 19ms 20ms 5.0% 10ms 12ms 16.7%
Q12 30ms 50ms 40.0% 80ms 142ms 43.7%
Q13 9ms 15ms 40.0% 3400ms 3500ms 2.9%
Q14 11ms 17ms 35.3% N/A N/A N/A
Q17 12ms 19ms 36.8% 6ms 6ms 0.0%
Q18 12ms 20ms 40.0% 31ms 48ms 35.4%
Q20 3ms 4ms 25.0% 36ms 42ms 14.3%

Some notes on these results:

  • Project (Arrow 0.8) and Project (Arrow 0.7) represent the processing time (in milliseconds) consumed by the “Project” operator for a particular TPC-H query using the vector implementation in respective Arrow versions.
  • Filter (Arrow 0.8) and Filter (Arrow 0.7) represent the processing time (in milliseconds) consumed for predicate evaluation using the vector implementation in respective Arrow versions.
  • The presence of N/A for Filter (Arrow 0.8) and Filter (Arrow 0.7) indicates that the particular query didn’t require any WHERE clause evaluation and thus no “Filter” operator.

The reason for choosing specific operators for performance comparison rather than full elapsed time of query is that the performance improvements from the new vector implementation are likely to impact the non-vectorized operators since here we work with vector APIs extensively. Dremio’s vectorized query processing algorithms are very optimized and bypass most of the Arrow code stack to interact directly with the Arrow in-memory buffers to perform the meaningful work.

As Arrow vectors are used extensively in Dremio, this new implementation has provided overall performance improvements for our users as a result of fewer objects being created inside Arrow, and more efficient function calls associated with the vector APIs.

We are excited to be able to make these enhancements knowing there is much more we can do to further improve the efficiency of operations in Apache Arrow. Stay tuned for more in the near future!

Apr 25, 2018

Dremio 2.0 Press Release

Dremio 2.0 Advances State of the Art Self-Service Data Platform with Enhancements to Data Reflections, New Learning Engine, Support for...