Dremio Jekyll


Introducing Dremio AWS Edition, Delivering Data Lake Insights On-Demand

May 5, 2020
Jason Nadeau

Introducing the Dremio AWS Edition


The growth of data lake storage is exploding. Cloud data lake storage solutions such as AWS S3 and Microsoft ADLS have become the first place where data lands, and while we still see industries using on-premise data lake solutions like HDFS, it is estimated that within the next 5 years over 50% of all data will live in cloud storage.

While cloud technologies have become a fundamental pillar for modern data architectures, you’re probably spending a lot more than you expected - perhaps shockingly so. Why is that? Much of the culprit is the fact that many cloud software and services aren’t architected to take advantage of elastic capabilities of the cloud to shrink resources when not in use. In 2019 alone, 7 out of every 10 dollars invested in the cloud were wasted due to lack of optimization and correct-sizing of cloud resources. But just as important is that, for analytics use cases, you’re probably using an expensive data warehouse for a lot more than you need to. The rise of next generation cloud data lakes, powered by Dremio, the data lake engine, means you can now just query the vast majority of your data in place at a much lower overall cost.

Dremio’s lightning fast queries in the cloud translates into extreme cost efficiency. With Dremio in the cloud you get the lowest cost per query at any speed, and you have the power to adjust your performance and cost dial to meet your business objectives. Our 4-100X faster query speeds mean that your EC2 compute costs can be reduced by over 75% compared to traditional SQL engines because you can size your EC2 instances to be (at least) 75% smaller and achieve the same performance. But we’re not stopping there!

Today we are excited to announce the all-new Dremio AWS Edition: this edition of Dremio is a production-grade, high-scale data lake engine highly optimized for AWS to eliminate costs for idle compute and thus further reduce infrastructure compute costs by over 60%. And it’s free! When combined with the savings enabled by Dremio’s market leading speed, you can save upwards of 90% of your EC2 compute costs. The Dremio AWS Edition also delivers a service-like-experience, right in your own AWS account, ensuring you get full control of your data with everything living in your own AWS VPC.


Learn more in the technical deep dive on Thursday, May 21st. Click here to register


With this release, Dremio is providing a free, streamlined, production-grade data lake engine available to all AWS users. It makes it very easy for you to deploy, to query data at petabyte scale directly from S3, and to stay current with our latest features while addressing critical issues like cost of cloud compute, performance management, and correct infrastructure sizing.

image alt text

The AWS Edition supports unlimited users and unlimited queries, includes Dremio’s industry-leading query acceleration features and self-service semantic layer, and is community-supported. Dremio AWS Edition is ideal for organizations of any size, particularly enterprise departments and teams. And if and when you need them, you can seamlessly enable enterprise security features and Dremio support via a paid annual subscription and subscription key purchased from Dremio separately.

image alt text

Dremio AWS Edition also introduces two new technologies to support data lake insights on demand as well as to reduce cloud infrastructure costs: elastic engines and parallel projects.


Elastic Engines: Independent and Resource-Efficient Compute Engines


Faster performance per compute node can translate into lower costs, since a smaller node (with lower associated infrastructure costs) can be used to deliver a given performance. Dremio is exceptionally fast - 4X faster for ad-hoc queries, and 100X faster for BI dashboards and reports. Which means, Dremio already helps you lower cloud infrastructure costs by more than 75%. And now we are taking things even further with our new elastic engines capabilities.

Elastic engines address two critical and inter-related additional challenges for data teams and IT leaders - performance and escalating cloud infrastructure costs. The faster a given compute process can complete, the faster it can stop consuming resources and generating costs.

Unfortunately, cloud software often isn’t architected to take advantage of the inherent elasticity of the cloud and release resources dynamically. At the same time, traditional scale-out query engines are built around a single execution cluster architecture supporting multiple, dynamic query workloads, making testing and optimizing of the cluster very difficult. As a result the cluster is either under-provisioned leading to workload contention and inconsistent, degraded performance, or more commonly it is heavily over-provisioned to cover peak demand, leading to low efficiency and increased infrastructure costs.


Check out our elastic engines tutorial to learn more


Elastic engines address these challenges in multiple ways. First, elastic engines is a multi-engine execution architecture, enabling data teams to configure any number of query engines within a given Dremio cluster. Note that with the Dremio AWS Edition we have optimized our cluster implementation for AWS and now refer to clusters as parallel projects - more below. Each engine shares a common metadata catalog, operates independently, and can be sized and tailored to the workload it supports.

image alt text

Elastic engines thus eliminates both under and over-provisioning of compute resources, thereby maximizing concurrency and performance while at the same time minimizing the required compute infrastructure. And since engines run inside projects which live in your own AWS account, these resource efficiency gains directly reduce your cloud costs. Building upon this workload isolation we have also layered on elastic, on-demand scaling for each engine.

image alt text

When there is no query activity, the engine remains shut down and consumes no compute resources. Incoming query demand triggers the engine to automatically start and elastically scale up to its full, tailored size. When the queries stop, the engine again automatically and elastically scales back down and stops. In other words, the Dremio AWS Edition is taking full advantage of the underlying elasticity of AWS to give you more value for every query.

image alt text

With the introduction of elastic engines we move from having a single, static, and over-provisioned cluster where you pay for the entire area inside the blue box, to a highly resource-efficient approach where you only pay for the compute being used.

image alt text

In this way elastic engines automatically eliminate resource consumption & cost for idle workloads. Elastic engines deliver over 60% of cost savings for over-provisioned environments (i.e. the majority), while providing over 10X faster performance at the same cost for under-provisioned environments.

A particularly exciting use case for elastic engines is for long running queries, say of 100 minutes or more. Since the performance of each engine also scales linearly with additional execution nodes, this new on-demand start and stop capability can dramatically accelerate long running query workloads.

image alt text

So instead of running a single long workload for 100 minutes on a single cluster, now you can flip the curve and run the same workload for 1 minute on 100 clusters and as soon as the workload is complete the nodes can be shut down freeing up the resources and shrinking your cloud bill. This is an innovative way to use our performance and linear scale capabilities to radically accelerate your workloads so you can gain faster results from your data at 1/10th of what you would have to spend on single-cluster SQL engines.

So now let’s put it all together. We start with the AWS EC2 compute cost for a traditional, static cluster SQL query engine, delivering an arbitrary level of performance. Note this could be at any scale, it doesn’t matter for the analysis. Dremio’s core Arrow-based engine is (conservatively) 2X faster on average at any scale, meaning you can size Dremio with half the EC2 resources. Then, we layer on our built-in acceleration features to deliver another (conservative) 2X performance gain, and once again enabling you to shrink your EC2 resources by another 50%. At this point you’ve saved 75% - that’s already a huge amount of savings. Finally, we layer on elastic engines, with their additional 60% savings on cloud infrastructure. The end result is a dramatic 90% reduction in cloud infrastructure costs relative to traditional SQL engine approaches.

image alt text


Parallel Projects: Multi-tenant Dremio Instances with Deep Lifecycle Automation


Now let’s zoom out of a given Dremio cluster with its multiple elastic engines and talk about our new parallel projects capabilities. Parallel projects address two sets of challenges for data operations teams. First, cloud software and services often require complex and manual deployment, configuration and upgrade processes that delay time to value and create a fragile, error-prone environment. With parallel projects you get a service-like cluster experience with end-to-end lifecycle automation across deployment, configuration with best practices, and upgrades, all running in your own AWS account. It’s the best of both worlds - the service experience of SaaS, but completely in your own VPC - data, compute, everything.

image alt text

Parallel projects always start with the latest version of Dremio, and updates and upgrades are seamless and require minimal interruption. This deep automation delivers a service-like experience where you can deploy an optimized Dremio instance from scratch, start querying your data in minutes, and effortlessly stay current with the latest Dremio features.


Check out our parallel projects tutorial to learn more


Second, cloud software is often single-tenant, with co-mingling of resources that make compliance difficult, limit business unit independence, and constrain experimentation and innovation. Parallel projects are fully multi-tenant Dremio instances containing all associated configuration, metadata, Data Reflections, and resources. This complete isolation facilitates compliance and enables business units to operate fully independently.

image alt text


Wrapping Up


Dremio AWS Edition delivers massive performance gains, deep infrastructure cost savings, and a service-like experience. It allows you to get up and running in a matter of minutes and enjoy production-grade Dremio features from the get-go while allowing you to have full control of your data and your compute in your AWS account. And it’s free!

Our new elastic engines engines capabilities ensure that your workloads get maximum query performance, without contention, at the lowest infrastructure cost. Say goodbye to cost for idle compute! And our new parallel projects capabilities provide full multi-tenant instance isolation, helping ensure compliance.

We are very excited about this new edition and its capabilities and look forward to your feedback. Join the deep dive webinar on Thursday, May 21st for more details. For a complete list of additional new features, enhancements and changes, please review the release notes. As always, please post questions on our community site.

Ready to get started?