Dremio Jekyll

Getting Oriented to Dremio

Intro

Welcome to Dremio!

In this tutorial we’ll orient you to the basics concepts of Dremio so you know your way around. We’ll point you to resources that will help you now and in the future, and we’ll end by suggesting the next tutorial, Working With Your First Dataset.

We encourage you to work through this tutorial. Here’s a video in case you’d rather sit back and watch.

 

Assumptions

You can follow this tutorial without installing Dremio. However, we think it will be easier to follow along if you have an installation you can access. Installing Dremio is easy - see the Quickstart for instructions. There are options for Windows, Mac, and Linux. This tutorial assumes you’re on version 1.1.0_20170812… or later.

We also suggest you ask questions on the Dremio Community site - we love to help.

About Dremio

Companies create data in a wide range of technologies, including relational databases, SaaS applications, NoSQL, Amazon S3, Hadoop, and other systems. In order to make sense of all this data, companies use BI tools like Tableau, Power BI, and Qlik, or data science products like Python and R. To make data from all the different sources available to all these analytical technologies, companies build data warehouses, data marts, and data lakes, and they move data with ETL, custom scripts, or data prep tools. With this approach, companies build enormous complexity and cost around their data. They also create an environment where business users are entirely dependent on IT.

We built Dremio to help analysts, data scientists, and data engineers be more effective with data. Dremio is different from traditional approaches to data analytics because it combines:

  • Self-Service Experience. We designed Dremio so that consumers of data can be independent and self-sufficient so they aren’t dependent on IT. With Dremio it’s easy to discover, curate, share, and accelerate data.
  • Integrations to Popular Tools. Analysts, data scientists, and data engineers are most effective using tools they already know and love, so we built Dremio to integrate with products like Tableau, Power BI, and Qlik, as well as Python and R. Dremio makes these tools even better.
  • Native Query Push-Downs. We think the world would be a better place without ETL and data warehouses, so we designed Dremio to push queries down into any source, using highly optimized integrations, even when the source doesn’t support SQL.
  • Data Acceleration. Making data fast is essential for analytics and data science, so we built an innovative data acceleration capability that automatically optimizes your data and your queries using Apache Arrow.
  • Data Lineage. We understand that data flows through pipelines and analytical processes, and this is very difficult to track across different users and different technologies. In Dremio, we’ve made understanding and tracking data lineage a key feature that’s easy for anyone to use.

If you’d like to know more about Dremio’s design, we suggest the Dremio Architecture Guide. Or, if you want to know more about how to deploy Dremio in your infrastructure, we suggest the Dremio Deployment Considerations guide.

Accessing Dremio

If you are the first person accessing Dremio, you’ll be asked to create an administrator account:

Dremio Admin User Creation

Once the admin user is set up, users will be asked to log into Dremio:

Dremio Login

Once you’re authenticated, you will see the home screen. If this is a new installation, it will look pretty empty:

Dremio Datasets - Empty

Sources

Let’s take a look around and start with the bottom left, Sources:

Dremio Datasets - Closeup

Sources are systems where data is managed in your company. Dremio connects to these sources to access datasets. If you already have a source in mind, that’s great! You can click on the New Source button at any time. You’ll see a dialog that will let you set up different kinds of sources, including relational databases, NoSQL, and distributed file systems:

Create New Source

For this tutorial we’re going to use a sample dataset stored on Amazon S3 that’s easy to connect to. Simply click the Add Sample Source button. You should now see a Samples source listed under your sources, and a samples.drmeio.com bucket listed as a folder on the right side of the screen:

View S3 Samples bucket called samples.dremio.com

If you double click on samples.dremio.com, you’ll see the public files we’ll use in our next tutorial:

Contents of S3 Folder

In our next tutorial we’ll take a look at how to work with these files. For now, let’s finish getting oriented to Dremio.

Spaces

Next, let’s create a space. Spaces allow users to collaborate around virtual datasets. You might create a space for a project, or for a team or department. Let’s create a space called “Stargate.” Click on the “New Space” button on the left side of the screen:

Create New Space

By default new spaces are public and visible by everyone. You can also configure which users have access to a space, either now or after the space is created. Let’s leave this space public.

Click Save to finish creating the ‘Stargate’ space. The screen should now show Stargate as a space:

Create space called Stargate

At the moment there’s nothing in this space. Users can save their virtual datasets in spaces they have “Can Edit” access to. We’ll take a closer look at these abilities in our next tutorial.

Home Space

Now, just above our spaces, we can see our user name next to a home icon.

Dremio Home Space

This is your home space. If you want to upload a file from your desktop, such as an Excel spreadsheet, it will be stored in your home space and will not be visible to other users. Datasets in your home space are like any other dataset in Dremio - they can be queried, joined to other datasets, analyzed with BI tools or Python, and so on.

Other Controls

Now let’s take a closer look at the bar on the top left of the screen:

Dremio Navigation - closeup of upper left

  • Datasets include Sources, Spaces, and your home space, as we’ve just reviewed.
  • Jobs are units of work processed by Dremio. For example, when a user issues a query to a dataset, this request is processed as a job. For each job Dremio tracks details such as who issued the request, metadata about the results, any errors, and other useful information. We haven’t set up any datasets, so there are no jobs in the system yet.
  • Search allows you to quickly find datasets. As you connect data sources, Dremio indexes metadata like the names of tables, columns, and fields. This makes it easy to find a dataset across all of your different sources and spaces.
  • New Query opens Dremio’s query editor. You can write full SQL against any of the datasets in Dremio and see results.

To wrap things up, let’s take a look at the buttons on the upper right:

Dremio Navigation - closeup of upper right

  • The Chat Icon gives you access to our engineers directly. If you have a question about Dremio, let’s talk! (note, this feature is limited to users of Dremio Enterprise Edition)
  • Help gives you access to the Dremio’s Community, documentation, and other resources.
  • Admin gives you access to Dremio’s administration menus.
  • Your name gives you access to your profile.

Next Steps

Now that you have a sense for how Dremio works, it’s time to get started working with some data. In the next tutorial we’ll look at Working With Your First Dataset.