Get Started Free
No time limit - totally free - just the way you like it.Sign Up Now
Today we are excited to announce the release of Dremio 4.9!
This month’s release delivers performance enhancements on how Dremio caches reflections, a variety of query performance improvements, a preview of the new Apache Arrow Flight server endpoint, Dremio AWS Edition enhancements, and much more. This blog post highlights the following updates:
To improve performance in traditional data warehouse schemas, last month’s Dremio 4.8 release introduced support for runtime filtering. Runtime filtering enables Dremio to dynamically apply filters from a smaller joined table to a larger table in order to enhance filtering on larger tables and significantly increase performance.
When this feature was released, it only supported partitioned columns. Today we are excited to announce that Dremio 4.9 expands the types of workloads that runtime filtering supports by adding support for non-partitioned columns as well.
This capability can result in orders of magnitude performance improvements when working with traditional star schemas. The feature is automatically enabled and requires no tuning or administration. Instead, Dremio simply applies filters from a joined table as they are generated.
In this release, Dremio enhances the caching of reflections. Through the use of Apache Arrow reflections, Dremio 4.9 reduces query processing time up to 80% compared to the performance of queries that use reflections stored in Apache Parquet files.
|Columns Involved in Query.||Processing Time (Parquet reflections) in seconds.||Processing Time (Arrow reflections) in seconds.||Percentage Improvement.|
|2 (all int)||0.150||0.073||51%|
|4 (all int)||0.290||0.136||53%|
|8 (4 int, 4 varchar)||1.620||0.265||84%|
|12 (8 int, 4 varchar)||1.839||0.371||80%|
The table above shows the impact of this enhancement. On this TPC-H benchmark, query processing time dropped 51-80% by caching the reflections using Apache Arrow.
In addition to accelerating query performance, Dremio 4.9 continues to make time to insight even faster by reducing query planning times by up to 30x for queries with a large number of reflections or complex multilevel joins.
Arrow Flight offers a dramatic performance improvement over ODBC and JDBC client connections by utilizing the Arrow format for data transfer internally. By avoiding expensive serialization and deserialization operations, Arrow Flight frequently offers 50-100x higher data transfer bandwidth over ODBC and JDBC connections which are popular industry standards for databases but not designed for transferring data lake scale datasets.
Arrow Flight builds on the Apache Arrow project, co-created by Dremio, which is now one of the most successful Apache Software Foundation projects with over 10 million downloads per month and has become an industry standard for efficient in-memory data representation and data exchange between systems. Arrow Flight introduces a new and modern standard for transporting data between networked applications.
Dremio 4.9 includes an Arrow Flight server endpoint. This enables any Arrow-enabled client to connect directly to Dremio instead of relying on ODBC or JDBC. Additionally, Dremio is working with the Arrow community to introduce Arrow client tools. An example of a client connecting to the server endpoint can be found here.
Queries coming through Arrow Flight are easily identified in the Jobs page in the Dremio UI. In addition, workload management rules can be set up to route these queries to the desired queue.
Dremio 4.9 also adds enhancements to Dremio AWS Edition. For example, starting in this release, users have the option to launch a Dremio AWS Edition environment with smaller engine instance types, providing users a lower cost option for experimentation and evaluation purposes.
For a complete list of new features, enhancements, changes and fixes, please review the release notes. As always, we look forward to your feedback. Please post any questions or comments on our community site.