From Signup to Subsecond Dashboards in Minutes with Dremio Cloud

   
  • Alex MercedDeveloper Advocate

Introducing Dremio Cloud

Have you heard the good news? Dremio released Dremio Cloud, the forever free open lakehouse platform. Incur no software costs or licensing costs and have the power of the Dremio’s lakehouse query engine, Sonar, and its intelligent metastore for Iceberg tables, Arctic. Let me show you how easy it is to get started.

Signing up for Dremio Cloud

To sign up for your free account, go to the Get Started page. 

Next:

  1. Choose whether you want to create an account using the North American or European Dremio control plane.
  2. Enter your email address
  3. Click “Sign Up for a Dremio Organization”
  4. Create a New Account with your email, or use Google, Microsoft, or GitHub single sign on.

Configuring Your First Project

Now that your account is created you’ll want to configure your initial Dremio Sonar project with your AWS account:

  1. Select an organization name
  2. Select a project name
  3. Select which AWS region to create the project in
  4. Click Next
  5. On the next page, configure your cloud connection. The easiest way to do this is via the provided CloudFormation template.

Adding Users and Roles

After you are done with CloudFormation you have the optional step of adding additional users and roles to your account. Once complete, click Next and you will be on the dashboard of the Dremio interface.

Add Sample Data

From the user interface, we will want to add sample data to our Sonar project. Click the dd Sample Data button and a new sample source will be added. Next:

  1. Click on the “Samples” data source
  2. Click on the “samples.dremio.com” folder
  3. Next to the “NYC-taxi-trips” folder, click the button to the right to promote the folder to a dataset
  4. Click Save

Creating a New Column and Saving a Virtual Dataset

It is easy to curate data for many purposes in Dremio Sonar. Let’s take the field that represents the distance in miles and use it to create a new column to show us the distance in kilometers. To do this:

  1. Click on the right corner of the distance_in_mi field to reveal a dropdown and select “calculated field”
  2. On the next screen, let’s multiply the miles field by 1.6 to get the values for our kilometer field (a mile is roughly 1.6 kilometers)
  3. Uncheck the box to keep the existing miles field
  4. Click Apply
  5. Click the dropdown in the upper right and select “Save View As” and name it NYC-taxi-data
  6. Go back to the main screen and select new View/Virtual Dataset

Creating Data Reflections

Dremio Sonar has many performance capabilities to bring you blazing-fast queries at scale. But we want our queries to be subsecond on our 340 million-record dataset, especially when we’re creating BI dashboards in Tableau. Dremio uses the Data Reflections tool to optimize queries. Reflections automates the optimization of queries on high-priority datasets. To use Reflections:

  1. Click Reflections in the top nav
  2. Turn on the toggle next to “Aggregate Reflections”
  3. Remove passenger_count from the dimension list, leaving pickup_datetime as the only dimension
  4. Add passenger_count to the list of measures
  5. Click Save
  6. Right-click on history to see the clock. When the clock stops, that indicates that the Reflection has been generated.
  7. Return to the main screen for the dataset

Connecting Tableau

With a few clicks we can connect tools like Tableau to our Dremio account to work directly and performantly with our data lake data, no need for cubes or marts. To connect our sample dataset to Tableau:

  1. Click on the Tableau logo to download the .tds file
  2. Open the .tds file to open Tableau
  3. Once Tableau opens, it will open a new tab in your browser so you can sign on and establish a connection with your Dremio account
  4. Return to Tableau and you’ll see all your columns from your dataset
  5. Move the “migrated_data” field into rows to see a full count of the dataset (this should be about 340 million records)
  6. Remove the field from rows and replace with passenger_count to see a count of all passengers from every ride. Note that each time we update the rows and columns, it is undergoing a new query on our 340 million-record dataset, which is completing in subsecond times thanks to the Reflection we created earlier.
  7. Move pickup_datetime to be a column
  8. Right-click on pickup_datetime to change it to measure by days
  9. Go back to the Dremio UI in your browser and click on the Jobs page to see all the queries Tableau sent and how quickly they have completed

Conclusion

Now you’ve experienced how easy Dremio Cloud makes data lakehouse architectures. Within a few moments of signup, you can be performantly querying and discovering insights from the data in your data lake. 

Your Dremio Cloud account comes with additional sample datasets that you can use. Also, since you’re running Dremio Cloud in your AWS account, you can connect Sonar to any dataset that’s in your AWS account in Amazon S3 or other sources.

From your Dremio Cloud account you can also try out Dremio Arctic, an intelligent metastore for Apache Iceberg. Just open up your organization settings and create a new project. Look for upcoming articles on getting started with Dremio Arctic.

Ready to Get Started? Here Are Some Resources to Help

Alteryx Analytic Platform and Dremio Open Lakehouse combine to simplify data operations and enable broad access to the data lake

Webinars

Unlocking Analytics from your Data Lake with Alteryx and Dremio

As a result of the accelerated growth of data lakes, data teams have been forced to either build and maintain expensive and complex processes to make new sources of data available for use in proprietary data warehouses, or hinder access to analytics for all data consumers. In this webinar, learn how the Alteryx Analytic Platform and Dremio Open Lakehouse combine to simplify data operations and enable broad access to the data lake for exploration, discovery, and insights.

read more

Webinars

How Open Lakehouses Simplify Analytics on Cloud Data Lakes

Cloud migration affords your organization the opportunity to rethink the fundamental architecture of corporate reporting and analytics system design. This webinar explores how cloud resources and services eliminate the need for costly data warehouse solutions that require significant data integration and preparation efforts.

read more

Guides

Data Virtualization vs. Data Lakes

Businesses need to aggregate data sources to be able to use the data. Data virtualization and data lakes are popular approaches, but which to choose?

read more

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

Watch Demo

Not ready to get started today? See the platform in action.

Check Out Demo