Dremio Jekyll


Announcing Dremio 3.2

May 16, 2019
Lucio Daza

Today we are excited to announce the release of Dremio 3.2!

This release includes over 200 improvements, including many key features that will enhance the way you access, analyze, and consume data, whether it is stored in your private, public, or hybrid cloud. Building on the innovative features included in our 3.1 release, our latest release further improves performance and simplifies how you deploy and operate Dremio at any scale and in any environment.

Join us this May 21st @ 12 pm PT in our deep dive webinar to learn more detail about all the great features and improvements included in this release.

Learn more below!

ADLS Gen2 and Azure Blob Storage Connector

image alt text

Recently, Microsoft made Azure Data Lake Store (ADLS) Gen2 generally available. ADLS Gen2 includes the best of the previous version of ADLS (now known as ADLS Gen1) and Azure Blob Storage.

Dremio 3.2 expands the way that you can connect to your data by adding connectivity support for:

  • Azure Data Lake Storage Gen2
  • Azure Blob Storage (v1 and v2)

Our new connector allows Azure users to take advantage of Dremio with the latest advances in cloud services, with the same ease of use and robust security features.

Column-Aware Predictive Pipelining

To improve performance at-scale, organizations utilize advanced file formats optimized for large analytical workloads, such as Apache Parquet and Apache ORC. These file formats support a number of optimizations, including columnar storage, multiple compression options, and storing data in binary machine-readable format. Today columnar file formats are the standard for storing data at large scale.

Dremio 3.2 includes a new predictive pipelining technology that leverages our deep understanding of columnar file formats and analytic workload patterns to intelligently predict likely access patterns and to coalesce nearby columns. These optimizations are especially effective with low cardinality fields. Our enhanced readers reduce wait time, increase throughput, and improve resource utilization. With predictive pipelining technology Dremio offers 2-4x faster query response times.

These improvements are cumulative with the performance improvements Azure provides in ADLS Gen2 storage over ADLS Gen1 storage. Combined, predictive pipelining together with ADLS Gen2 results in 4-6x faster query response times over ADLS Gen1 with Dremio 3.1.

The following graph illustrates the improvements provided by Dremio 3.2 and ADLS Gen2. Using TPC-H and a scale factor of 100 on a single node, you can see that Dremio 3.2 is ~3x faster than Dremio 3.1 on ADLS Gen1, and ~5x-7x faster on ADLS Gen2.

image alt text

Scalability and Concurrency Enhancements

You won’t see a specific set of new features for scalability and concurrency enhancements, but with every release we continue to advance the state of the art for the world’s fastest SQL engine for the data lake. Each year we have a goal of improving performance by 10x, and some years we do much more. Here are a few highlights for Dremio 3.2:

  • Query planning and initialization. In large clusters (hundreds of nodes) queries start up to 100x faster. We have made a number of improvements to how we manage metadata for query planning and coordinate activities across the cluster. The larger the cluster the more these enhancements improve on prior versions of Dremio.

  • Big metadata. For Hive-based datasets we have continued to push the envelope on scalability of partitions and splits. Dremio can now easily handle tables with hundreds of thousands of partitions and splits while providing interactive performance.

Many thanks to our customers who have some of the world’s largest deployments in terms of data volume and infrastructure size. By working closely with their operations teams we have been able to continue to enhance Dremio to support the most demanding workloads. Have your own scalability challenge? Reach out and we’d love to hear about it.

Kubernetes and Helm Deployment Support

Dremio 3.2 brings official Kubernetes and Helm deployment support, simplifying and hardening your Dremio deployments and operations.

Docker images for both Dremio Enterprise Edition and Dremio Community Edition are now available with every release. This simplifies and hardens how operators:

  • Deploy a production-ready Dremio cluster on Azure AKS, Amazon EKS, or standalone Kubernetes Clusters.
  • Upgrade clusters with a single Helm command.
  • Manage cluster lifecycle, including high availability, node failures, cluster scaling, configuration change management, and much more.

Easy Cloud Provisioning for Evaluation

While Kubernetes provides a robust way to manage large clusters in production deployments, there can still be a fair amount of initial set up to get started. We wanted to provide the easiest possible way to provision a Dremio cluster on Azure and AWS for evaluation purposes. Now, in Dremio 3.2 starter cloud templates will be available to quickly get evaluation deployments started using:

  • AWS CloudFormation
  • Azure ARM

image alt text

For evaluation cases where deploying Dremio 3.2 on the Azure cloud is key, simply go to Dremio’s deploy page and click on “ARM Template”. This will prompt you to log in using your Azure portal account. With each of these templates you simply provide a few pieces of information about the environment you wish to deploy.

image alt text

Once the deployment is complete, use the URL provided in the output tab to connect to your new Dremo cluster!

image alt text

For more details on how to complete the deployment configuration, check out the release notes for Dremio 3.2

Data Reflections - Enhanced Incremental Refresh

Data Reflections, Dremio’s patented approach to data acceleration, are enhanced in this release. As physical datasets change, Data Reflections should be periodically refreshed. Up to Dremio 3.1 incremental updates were only possible with sources using a BigInt field for determining the delta.

image alt text

Starting in Dremio 3.2 incremental refresh works with a wider range of data types for monotonically increasing fields, including BigInt, Int, Timestamp, Date, Varchar, Float, Double and Decimal types.

image alt text

New Dataset Rendering Engine

One of our main goals is to increase the productivity of our users. In Dremio 3.1, users can perform data access and data curation activities while Dremio generates previews asynchronously in the background.

 

Prior to Dremio 3.2, rendering would block other operations. Now, as users apply transformations, joins, aggregations, and projections, Dremio will generate previews of datasets asynchronously, allowing users to continue to iterate as Dremio works seamlessly in the background.

Support for Complex ORC Data Types

Dremio 3.2 adds support for reading complex data types in non-transactional, and compacted transactional Hive ORC files. These data types include:

  • LIST, e.g. SELECT list[0] from HiveOrcTable
  • STRUCT, e.g. SELECT struct[‘field’] from HiveOrcTable
  • MAP, e.g. SELECT flatten(map_field)[‘key’], flatten(map_field)[‘value’] from HiveOrcTable

LDAP Support for Group Lists

One of the key components for data security is the ability to source users from enterprise directory services such as AD/LDAP, and being able to authenticate them against such services as well. Dremio 3.2 enhances the way that LDAP groups are configured by allowing users to list the members of each group while configuring security and sharing options in the UI.

In Summary: With Dremio 3.2, You Can:

  • Easily deploy and configure Dremio on the cloud to start gaining insights from your data in minutes.
  • Accelerate time to production by taking full advantage of Kubernetes and Helm deployment support.
  • Leverage the world’s fastest SQL engine for the Azure Data Lake.
  • Increase the productivity of your data consumers.
  • And much more!

We are very excited about this release and we hope you are too. As always, we look forward to your feedback. If you’d like to hear more about these features please join us this May 21st @ 12 pm PT in our deep dive webinar to learn more detail about all the great features and improvements included in this release.

Ready to get started?