Intro
Welcome to Dremio! This tutorial will orient you to the basic concepts of Dremio, and point you to resources that will help you now and in the future. We also have created a video if you would like to sit back and watch instead.
Preparation
Participate in this tutorial without installing Dremio. However, we think it will be easier to follow along if you have an installation you can access. Deploying Dremio is easy — check out the deployment page for more detail. If you have any questions throughout the tutorial, feel free to post them on the Dremio Community site.
About Dremio
Companies create data in a wide range of technologies, including relational databases, SaaS applications, NoSQL, Amazon S3, Apache Hadoop, and other systems. In order to make sense of all this data, companies use business intelligence (BI) tools like Tableau, Power BI, and Qlik, or data science products like Python and R. To make data from a variety of sources available to all these analytical technologies, companies build data warehouses, data marts, and data lakes, and move data with ETL, custom scripts, or data prep tools. With this approach, companies build enormous complexity and cost around their data, and create an environment where business users are entirely dependent on IT. We built Dremio to help analysts, data scientists, and data engineers be more effective with data. Dremio provides an integrated, self-service interface for data lakes. Designed for BI users and data scientists, the data lake query engine incorporates capabilities for data acceleration, data curation, and data lineage — all on any source and delivered as a self-service platform. Standout features include:
Accessing Dremio
To initially access Dremio, you’ll be asked to create an administrator account:
Once the admin account is set up, users will be asked to log in to Dremio:
Once you’re authenticated, you will see the home screen. If this is a new installation, it will look fairly sparse:
Data Sources
Let’s explore the home screen, starting with the bottom left, “Data Lakes” and “External Sources”:
Dremio connects to these sources to access datasets. If you already have a source in mind, you can click on the + sign to connect to a new data lake source or to an external source at any time. You’ll see a prompt that will allow you to set up different kinds of sources from your data lake, including table stores and file stores:
This tutorial uses a sample dataset stored on Amazon S3 that’s easy to connect to. Simply click the “Sample Source” button. You should now see a Samples source listed under your sources, and a samples.dremio.com bucket listed as a folder on the right side of the screen:
If you double click on samples.dremio.com, you’ll see the public files that are used in our next tutorial that takes a look at how to work with these files:
Spaces
Continuing our Dremio orientation, let’s create a “space.” Spaces allow users to collaborate around virtual datasets. For example, you might create a space for a project, a team, or a department. Let’s create a new space called “Dremio101.” Click on the “New Space” button on the left side of the screen (alternatively, you can click on the + sign on the Spaces panel):
By default, new spaces are public and visible to everyone. You can configure which users have access to a space, either now or after the space is created. For now, let’s leave this space public.
Click “Save” to finish creating the “Dremio101” space. The screen should now show “Dremio101” as a space:
At the moment the space is empty, as shown by the 0 next to the name of the space. Users can save their virtual datasets in spaces they have “Can Edit” access to. We’ll take a closer look at these capabilities in our next tutorial.
Home Space
Now, just above spaces, you can see our user name next to a home icon.
This is your home space. Every Dremio user has a private home space where they can upload files or store virtual datasets that they are working on, which are not visible to other users. Datasets in your home space are like any other dataset in Dremio — they can be queried, joined to other datasets, analyzed with BI tools or data science tools, and more.
Other Controls
Now let’s take a closer look at the toolbar at the top left of the screen:
- Datasets include sources, spaces, and your home space.
- Jobs are units of work processed by Dremio. For example, when a user issues a query to a dataset, this request is processed as a job. For each job, Dremio tracks details such as who issued the request, metadata about the results, any errors, and other useful information. At this point, we haven’t set up any datasets, so there are no jobs in the system yet.
- Search allows you to quickly find datasets. As you connect data sources, Dremio indexes metadata like the names of tables, columns, and fields. This makes it easy to find a dataset across all of your different sources and spaces.
- New Query opens Dremio’s query editor. You can write full SQL against any of the datasets in Dremio and see results.
Next, let’s take a look at the buttons on the upper right:
- Help gives you access to the Dremio Community site, documentation, and other resources.
- Admin gives you access to Dremio’s administration menus.
- Your name gives you access to your personal profile.
Next Steps
Now that you have a sense for how Dremio works, it’s time to get started working with some data. Check out the next tutorial that focuses on Working with Your First Dataset.