h2h2h2

11 minute read · March 30, 2022

Dremio Cloud Under the Hood

Nithin Krishna Reghunathan

Nithin Krishna Reghunathan · Staff Product Manager, Dremio

We recently announced the general availability of the Dremio Cloud platform, the world’s first forever-free, fully managed lakehouse platform. Dremio Cloud is a frictionless, infinitely scalable platform that helps organizations run all their SQL workloads and automate data management operations. Two key services are available as part of the Dremio Cloud platform: Dremio Sonar, a lakehouse engine built for SQL, and Dremio Arctic, a metadata and data management service for Apache Iceberg that provides a unique Git-like experience for the lakehouse. 

Dremio Sonar provides the  SQL lakehouse engine for open data platforms. It offers  all of the performance and functionality of a data warehouse without the need for data copies and complex ETL pipelines. Data engineering teams don't have to manage an additional layer for BI extracts, cubes, and/or physical data marts. Dremio Arctic (currently under public preview) provides an intelligent metastore for Apache Iceberg. Arctic simplifies data engineering by making data workflows as powerful and intuitive as working with source code.

Dremio Cloud is built with a unique architecture that provides all the benefits of SaaS, like a fully managed service, while ensuring storage and processing of data stays within the customer’s cloud account. It is designed from the ground up  with security and privacy of our customers' data being topmost priorities. Dremio Cloud provides enterprise-grade security and governance to ensure that data can be safely accessed from data sources across the enterprise. It offers end-to-end security  by ensuring data encryption at rest and in transit. 

In this blog, we will provide an under-the-hood view of the Dremio Cloud architecture, and review the pillars it provides to help organizations of all sizes successfully build an enterprise lakehouse architecture, including its frictionless experience, elasticity, and enterprise security.

Dremio Cloud Architecture

Dremio Cloud consists of two major architectural components: (i) an always-on global control plane that receives queries from clients and is responsible for query planning and engine management, and (ii) an execution plane comprised of compute engines that are responsible for query execution.

Unlike other platforms, these architectural components reside in different virtual private clouds (VPCs):  (i) Dremio VPC and (ii) Customer VPC. The control plane resides in Dremio’s VPC. The execution plane resides in the customer's VPC. If you use multiple cloud accounts with Dremio Cloud, each VPC acts as an execution plane. With this unique architecture, users can eliminate the risk of loading data into a different company’s cloud account; maintain complete visibility and control over your data, since it all stays in your account. 

Figure 1. Architecture deep dive

Dremio's Global Control Plane

The Dremio Cloud control plane serves as the single pane of glass for you to manage your Dremio Cloud deployment, including users, security, and integrations. A control plane makes data management easier, especially for global companies with large data footprints.

The Dremio Cloud control plane is a multi-tenant, always-on service that is responsible for query planning and management. This control plane is hosted and monitored by Dremio, in a Dremio-managed cloud account. The multi-tenant control plane is central to the Dremio Cloud organization experience, hosting all client-facing interactions, including the user interface, REST API, and data query endpoints. The control plane securely delegates query execution to compute engines, which are automatically provisioned within the organization's VPC, so all data access and processing remains within the organization’s cloud account. 

Organizations can manage deployments across multiple cloud regions within a single cloud today, and multiple clouds in the future, from a single control plane. In addition, Dremio Cloud offers separate control planes by geography (US and EU) to meet organizations' needs for regional data locality. Dremio’s control plane uses a scalable microservices-based architecture that can automatically scale to achieve near-infinite concurrency. As a result, end users can log on to a single portal rather than having to remember an endpoint or IP address and also leverage seamless authentication via social or enterprise IDPs.

Dremio Cloud supports OpenID Connect to authenticate users with any social or corporate identity provider. In addition, OAuth 2.0 and Personal Access Tokens are supported on all interfaces, including programmatic ones (ODBC, JDBC, REST API, and Arrow Flight). With native connectors and SSO in popular BI tools such as Tableau and Power BI, users are automatically authenticated when running queries.

Execution Plane

The execution plane in Dremio Cloud resides in the organization's cloud account (VPC), and consists of one or more compute engines, which are automatically provisioned as needed by the control plane. For instance, in the case of Dremio execution resources running in AWS, compute engines are deployed as AWS EC2 instances within the organization's VPC. Compute engines are available in a range of predetermined sizes, with each size corresponding to a certain number of EC2 virtual machines, each running on a particular instance type. 

Autoscaling Engines for Infinite Concurrency

The Dremio Cloud platform introduces the capability for engines to dynamically scale based on workload size, helping companies tackle any level of concurrency while maintaining consistent performance. It offers a highly scalable multi-engine architecture. The multi-engine execution plane architecture consists of one or more right-sized engines (as shown in fig.2 ) supporting different workloads within an organization. For example, an organization can have a medium-sized engine for executive dashboards, a large engine for batch jobs, and an extra-large engine for data science queries. This approach ensures workload isolation within the platform and also enables the organizations to solve the “noisy neighbor” issue while handling simultaneous workloads with very different resource requirements and performance characteristics.

Figure 2. Multi-engine architecture

While engines are initially sized to one of a set of predefined sizes, they can automatically scale out on demand to process a nearly unlimited number of concurrent queries. Auto-scaling of an engine is accomplished by adding additional engine "replicas," which represent the unit of scale. This means that a "Large" sized engine would auto-scale by adding another identical "Large" sized engine at a time, up to the maximum number of replicas allowed for the engine. This concept is illustrated below in Fig. 3. Engines scale out to add execution resources as more queries are routed to them, and replicas are then removed as query volume decreases.

Figure 3. Auto-scale engines for infinite concurrency

The result is organizations do not have to worry about capacity or perform complex sizing exercises. They use only the compute capacity required to service queries at a given time. Additionally, Dremio Cloud terminates idle engines, so no ongoing costs are incurred when queries are not being processed. And while the engines can scale out infinitely, administrators can easily set scaling limits to govern costs. Because all data is stored and processed within the organization's cloud account, the security mechanisms and data encryption provided (at rest and in transit) by the cloud data lake platform is maintained. This approach also ensures that organizations have full control of their data. 

Query Workflow in Dremio Cloud

The diagram below shows how the two planes interact when a user logs in to Dremio Cloud and runs a query.

Figure 4. Query workflow in Dremio Cloud

The steps involved in the query workflow are as follows:

  • The diagram above shows how the user authenticates to Dremio Cloud through a BI client application. Note:  Users can also connect/authenticate to Dremio Cloud via web UI, ODBC/JDBC, Flight, or REST API.
  • The SQL proxy passes the credentials to the authentication manager, which validates the credentials and approves the authentication request.
  • The person who authenticated issues a query to Dremio.
  • The SQL proxy forwards the query to the query planner.
  • The query planner notifies the engine manager of the request.
  •  The engine manager service starts or scales the engine as needed to execute the query and submits the query directly to the query engines (cloud instances) based on predefined rules. The query engines within the execution plane are deployed in your organization's cloud account (VPC).
  • The query planner passes the plan for the query to the compute engine that the engine manager has designated.
  • The compute engine passes the results of the query back to the query planner, which passes them to the SQL proxy, which then passes them to the BI client application.

Signing Up for Dremio Cloud

You can easily sign up for Dremio Cloud and get started within a few minutes. The easiest way to get started with  Dremio Cloud is to leverage the  automatic, self-service onboarding using the AWS CloudFormation template. If you have additional security rules, you can make use of our cloud connect page, which allows you to manually create AWS resources and grant Dremio Cloud access to them. Our platform also provides sample data sets so you can get started quickly. 

Dremio cloud offers a standard (forever-free) edition which provides everything you need to successfully build, automate, and query your lake house in the production environment. The standard edition is ideal for startups and enterprises with small data teams, whereas the enterprise edition is ideal for organizations with more robust security and support requirements.

Get started with Dremio Cloud with this self-paced guide that walks you through the first steps to connecting to cloud data lake storage and running queries directly on the data there. 

You may post any questions about Dremio Cloud in our community forum.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.