Dremio Jekyll


Dremio AWS Edition Deployment Architecture - What’s under the hood

Sep 15, 2020
Serge Leontiev

When building your modern cloud data lake, you should consider a next-generation query engine that supports your analytical workloads, delivers lightning speed directly on cloud data lake storage, provides a secure, self-service semantic layer and maintains a flexible and open architecture.

In May, Dremio introduced Dremio AWS Edition, a data lake engine with a service-like experience and unparalleled resource efficiency, which is available for free via the AWS Marketplace as a community or enterprise edition. Dremio AWS Edition eliminates workload contention to maximize query performance, and reduces cloud infrastructure costs by up to 90% compared to traditional query engines.


Learn more about Dremio solutions here


Let’s take a look at how Dremio AWS Edition is provisioned.

The diagram below outlines the Dremio AWS Edition deployment architecture. The provisioning process is based on the AWS CloudFormation stack template (CFT) that’s launched when Dremio is selected from AWS Marketplace. Users leverage the existing VPC and subnet for the selected region and availability zone in their own tenancies to seamlessly provision all required Dremio resources defined by the CFT.

The CloudFormation template creates:

  • A new security group for the Dremio deployment which outlines rules for inbound and outbound network traffic
  • A new IAM role and policy to manage access to EC2 and S3 resources
  • A new compute instance for the Dremio coordinator which contains the core Dremio application that manages the cluster

The Dremio coordinator is responsible for creating and launching the Dremio project that contains all deployment-specific definitions and metadata, logs and admin settings stored in the dedicated S3 bucket. It also manages and controls Dremio elastic engines which are responsible for the execution and processing of queries, and can be dynamically provisioned, started or stopped on demand as well as scaled up or down for more efficient use and cost management of compute resources.

The Dremio coordinator instance and Dremio engines leverage direct-attached elastic block storage (EBS) to store configuration, logs and operational data. Dremio engines also utilize EBS to create and store data in the columnar cloud cache (C3) which is used to accelerate query performance.

Both the Dremio coordinator and engines discover and access datasets directly from S3 cloud storage or through the AWS Glue Catalog.

image alt text

Learn more about Dremio AWS Edition here.

Ready to get started?