Dremio Jekyll


Announcing Dremio 3.1

Jan 25, 2019
Kelly Stirman

Today we are very excited to announce the release of Dremio 3.1. Please join us on February 7th for a deep dive on Dremio 3.1.

Dremio 3.1 includes many key features that will enhance the way you access, analyze, and consume data. Building on the breakthrough capabilities included in Dremio 3.0, our latest release improves query planning and execution, increases Data Reflection matching, and delivers demanding workloads more efficiently. And as always, we continue to enhance how Dremio connects to your data, wherever it is located. Learn more below!

New Reflection Matching Algorithm for Faster Analytics

This release introduces an entirely new Data Reflection matching algorithm that improves the way that Dremio identifies matching Reflections for virtual datasets. This enhancement to our current matching algorithm significantly increases Dremio’s ability to match Data Reflections with simple and highly complex virtual datasets, which improves the experience of end users by accelerating more queries more frequently. To take advantage of these enhancements, users do not need to change their queries or recreate their Data Reflections.

Ensure the best user experience through Multi-Tenant Workload Controls

Another key capability we’ve added in Dremio 3.1 is controls for multi-tenant environments related to workload management. Companies want to deploy mixed workloads to various SLAs through a common pool of resources.

With Dremio 3.1 you can now assign jobs to resource queues, with fine-grained control of CPU, memory, concurrency, queue depth, runtime limits, and enqueued time limits. Jobs are assigned to rules based on query-time factors such as user identity, LDAP group membership, job type, query plan cost, or any combination of these factors.

image alt text

This capability is particularly important in Dremio clusters that are deployed in multi-tenant environments with a variety of workloads ranging from exploratory queries to scheduled reporting queries. Now Dremio operators can:

  • Meet SLAs of diverse business needs
  • Prioritize, queue, or reject requests based on user or work type
  • Consolidate workloads onto a single cluster
  • Capacity plan for future growth
  • Manage for the unexpected

You will be able to find the UI screen for this feature under the Admin menu of your Dremio user console. In just a few clicks you can, for example, ensure your interns are unable to run expensive jobs during business hours.

Work asynchronously with enhanced Dataset Previews

One of our main goals is to increase the productivity of our users. In Dremio 3.1, users can perform data access and data curation activities while Dremio generates previews asynchronously in the background.

 

As users apply transformations, joins, aggregations, and projections, Dremio will generate previews of datasets asynchronously, allowing users to continue to iterate as Dremio works seamlessly in the background.

New and Improved Relational Connectors

Starting in Dremio 3.0 we developed an all-new declarative framework (ARP) for developing relational connectors. This advanced framework allows us to standardize all relational connectors on a single code base that is now more efficient, provides better push-down abilities, and is easier for us to maintain. With Dremio 3.1, Redshift, Oracle, and MySQL benefit from the same robust ARP-based framework as Teradata, Postgres, and SQL Server.

Gandiva Enhancements

Since the introduction of the Gandiva Initiative for Apache Arrow, we have witnessed tremendous success and adoption of the technology by the community and enterprises around the globe. Dremio 3.1 continues building on our promise that performance is a key focus in every release, that is why we continue making the benefits of Gandiva available to users. (We shipped 3.1 with this feature off by default so that users can opt into testing this new kernel – if you’re interested, please send a note to preview@dremio.com.)

The benefits of Gandiva can be quite striking in some contexts. For example, we worked with an early tester on a complex query that was improved by over 70x. We still have a lot of work to do to provide 100% coverage under this new engine, but for now many queries can be optimized with Gandiva, and those that cannot, will automatically run through in the JVM.

In addition, we view Gandiva as the optimal way to create UDFs for Dremio and other systems built on Apache Arrow and have a post explaining how you can build your own. There are over a million downloads of Arrow each month, so the work to build a UDF can be far-reaching, not just for Dremio.

Aggregation Spilling

For low-memory environments, Dremio can now complete queries that exceed available resources for aggregation operations. This feature automatically spills data to disk when limits are exceeded by memory-intensive queries, providing uninterrupted query processing for all your analytics needs. (We shipped 3.1 with this feature off by default so that users can opt into testing this new kernel – if you’re interested, please send a note to preview@dremio.com.)

This is useful for the following:

  • Memory-intensive hash aggregation queries (GROUP BY queries) that process large datasets.
  • Low memory environments.

Time To Get Started

We are very excited about this release and we hope you are too. As always, we look forward to your feedback. If you’d like to hear more about these features please join us on February 7th for a deep dive on Dremio 3.1. Please post questions on our community site and we’ll do our best to answer them there, along with other members of the Dremio community.

Ready to get started?