Dremio Jekyll


Announcing Dremio November 2020

Nov 24, 2020
Lucio Daza

Today we are excited to announce the release of Dremio November 2020!

This month’s release delivers very useful performance-oriented features like vectorized readers for complex data types, improvements on pushdown filters for Parquet, as well as more flexibility on Dremio AWS Edition projects. This blog post highlights the following updates:

  • Vectorized readers for complex data types
  • Pushdown IN-list readers for Parquet
  • Pushdown multiple filters for Parquet
  • Dremio AWS Edition custom projects

Vectorized Readers for Complex Data Types

In this release, Dremio introduces a significant step-up of performance in the way that it handles deeply nested structures and lists. The introduction of vectorized readers for complex data types makes Dremio more efficient in how it reads complex data structures from within Parquet files.

Vectorized readers provide much more efficient processing throughout the Dremio stack by reading data from nested fields into Arrow vectors batch by batch. It also reduces the amount of information that Dremio has to read from Parquet files by not reading subfields that are not selected by queries. It also improves processing of deeply nested data structures like multiple levels of structs or lists within structs or lists.

When enabled, all sources and Parquet datasets (reflections datasets, Hive, Glue and Iceberg datasets, etc.) will make use of vectorized readers. Vectorized readers is disabled by default, please check the release notes for details on how to enable it.

Pushdown IN-List Readers for Parquet

This release also includes improvements in pushdown support, specifically for IN-list and OR clauses. This improvement also supports pushdown for multiple predicates.

An example of the IN-list and OR filters now supported include:

1
2
3
WHERE Employee_ID IN (100, 110, 120)

WHERE Employee_ID = 100 OR Employee_ID = 110 Employee_ID = 120*

** OR clauses are converted into an IN filter and pushed down.*

This new improvement supports row-group and page-level pruning, for example, if queries only process the most recent month/quarter/year of data, then this provides effective pruning as datasets add years of data, since Dremio is able to more effectively prune unused row and page groups from consideration. This will be very impactful for a great variety of use cases. This feature is disabled by default, please check the release notes for details on how to enable it.

Pushdown Multiple Filters for Parquet

In addition to the introduction of pushdown IN-list readers for Parquet, the latest release of Dremio also introduces support for multiple filters for Parquet files. Pruning using multiple predicates is now supported, some examples include:

1
2
3
 WHERE state = CA AND year = 2020

 WHERE employee_id IN (100, 110, 120 ) AND salary >= 1000.0

This feature uses all predicates for row-group and page-level pruning, and also uses all predicates (except IN-list and OR clauses) for row-level pruning. This is an exciting performance enhancement which will be very impactful for a variety of use cases. This feature is disabled by default, please check the release notes for details on how to enable it.

Dremio AWS Edition Custom Projects

Dremio November 2020 also adds improvements to Dremio AWS Edition. Starting in this release, users have the option to create Dremio projects using existing AWS resources that they created. This improvement allows users to create objects that meet any encryption or any other requirements that they may have and use them with their Dremio AWS Edition deployment. This provides users with more control over the underlying objects in their environment.

While Dremio AWS Edition will still allow users to create projects through the UI, this enhancement gives users the ability to create projects in a more programmatic and automated way.

To make sure existing objects are properly used by Dremio, Dremio will provide users with documentation on how to tag and name EBS and EFS volumes as well as S3 buckets. Both tagging and naming can be done through the AWS console or programmatically through AWS APIs. Once the objects are tagged and named according to the convention, they will then be incorporated into a new Dremio project. Once this project has been created, it will behave as any other Dremio project. To learn more about custom projects, check out the latest release notes.

Learn More

For a complete list of new features, enhancements, changes and fixes, please review the release notes. As always, we look forward to your feedback. Please post any questions or comments on our community site.

Ready to get started?