Dremio Jekyll

Connecting Your Elasticsearch Cluster To Dremio

Intro

Elasticsearch clusters are a great and easy way to store data for fast retrieval. It is efficient, distributed and creates a search index for very powerful text based analytics. This also makes it quite a popular no-Sql data storage option and uses a JSON - type format for storing its data.

The down-side to ES is that building visualizations in it is quite limited through the Kibana plugin. It functions differently from more universally used platforms like Tableau and thus proves to be a hassle for BI developers and data scientists who need to get some Exploratory data analysis done on large volumes of data indexed in Elasticsearch. It also limits the users to people with experience on building Lucene Queries. Dremio aims to provide a solution to this problem.It uses the power of ES with the ease of Structured Query Language.

Dremio functions as an interface between Elasticsearch and Tableau. Dremio pulls the JSON formatted data from Elasticsearch and processes it into a tabular format which can then be recognized by Tableau. The efficiency of Dremio comes into play when querying. It converts the SQL query into Query DSL in majority of situations and allows the Elasticsearch cluster to efficiently perform as much of the query as possible. What Elasticsearch cannot perform Dremio takes care of automatically (eg, SQL joins).

The transformation of the structure of the data in ES to the well know relational model follows this mapping:

Elasticsearch SQL
Index Schema
Document Type Table
Document Tuple

Our ES cluster consists of network usage data that is used for detecting anomalous users. It contains different attributes regarding user activity. This image provides a list of all fields and sample values:

Schema of the elasticsearch anomaly detection application

Anomaly detection is a key field of research as it can be used to effectively identify threats to the system or spammers who try to bring it down. Visualizations in this case can be crucial to designing a system based on detecting specific user patterns/behaviors and it would be quite advantageous if the expertise of a Tableau BI developer can be applied to this context without needing them to know to build visualizations in Kibana or any equivalent platform.

We looked into how we can create visualizations to perform some Exploratory Analysis on this dataset. While Kibana does provide visualizations, Tableau would be a powerful tool in this context and Dremio provides the link between the Technologies.

Installing Dremio

There are two major steps to installing dremio:

1 Installing the Dremio server that will connect to the ES cluster and query the data.

2 Installing the Dremio Connector on the Tableau machine in order for it to connect to Dremio.

Dremio Server

Note: Java JDK 8 will have to be installed on the machine before installing dremio.

1 Download the Dremio installer from the official website.

download the Dremio installer

2 Once downloaded, run the installer and follow the instructions in order to have dremio server installed.

3 Dremio should be installed. Open the application and click start Dremio.

4 The server should start running. Click on open Dremio, a web page should open. Fill in new admin user details to access the application and the Dremio UI should now be accessible.

Dremio Connector

1 Go to the Dremio Website, download and install the Dremio ODBC Connector.

2 Open Windows ODBC data source Wizard. In System DSN click on Add. A pop window should open with Driver options. Click on Dremio Connector and click Finish.

3 A configuration wizard should open up. Give the admin details that were created for dremio and the connection details for the Dremio Server. Click Ok.

4 The connector is now ready as well. We can now go ahead with connecting to our ES cluster.

Connecting Dremio Server

1 Open the Dremio UI and click on Sources. Select Elasticsearch.

2 Give the address details of every ES node/host in the dialog box. Provide authentication if enabled.

3 If any of the nodes are currently unavailable but will come up later on. Tick ‘Query Whitelisted Hosts only in advanced options. Other specific options are also available.

4 Click on save.

5 The new source should be visible on the side. Click on it to display the available data. Each of the available indexes will be displayed. Clicking on an index will display the different Doc types within it as individual tables.

6 Clicking on a table opens up the table and a query editor window. Here any normal SQL query can be written to pull the data.

7 The user can directly access the table or can create a virtual table to be accessed by Tableau. In order to do this simply write a SQL query on the query editor. Once the results are satisfactory simply save the dataset. It will now be available to be accessed by Tableau.

8 Dremio also allows us to build queries using user friendly interfaces. Suppose there is the need for a group by query, Click on the GroupBy icon at the bottom left of the SQL Editor. The query builder interface will open up.

Group by query builder interface in dremio

9 Let’s say we need to count the number of incidents with respect to different categories, we can drag and drop Category in Dimensions and Drop Incidents in Measures. Then select Count from the menu to carry out the counting operation. Click on preview to check the results. Once confirmed click on Save to save the virtual dataset.

Click save to save the virtual dataset

10 Another example would be when creating a join. Click on the join icon at the bottom left corner of the SQL Editor. The interface will open up showing available datasets to join with. Select the dataset necessary and click next. If you’ve been using Dremio for a while, it will recommend joins to you based on the dataset you’re looking at.

11 A new UI should open up to select the columns to join by and the type of join.

building the join sql expression in dremio

12 Drag and drop fields from both tables to create the join conditions. Clicking preview will give a sample of the join. Clicking apply will create this as virtual dataset and be ready for tableau to query.

![Saving the virutal dataset in Dremio(/img/tutorials/connecting-elasticsearch-cluster-to-dremio/image_5.png)

13 Other SQL functions are also available to be used readily. In the SQL Editor UI Click on functions in the top right corner. This will give a list of SQL functions that can be used in the query.

Adding a sql function to our query

14 This makes Dremio a full fledged powerful SQL editor with a user friendly interface.

15 Dremio also adds more functionality to address Elasticsearch Index patterns. In order to access data of an index following a particular pattern, regex queries can also be written.For example, in order to get consolidated client table data from multiple indexes starting with the word ‘access_data’ the following query can be written.

1
SELECT * FROM CDX_dist."access_dist*”.client

16 This dataset can now be saved and accessed from Tableau as a table

Connecting Tableau

1 One simple and direct approach would be to click on the Tableau button, which will launch Tableau connected to this dataset.

clicking the Tableau button in Dremio

2 This will create the .tds file with all the necessary config. Clicking on this will open up Tableau and prompt datasource authorization. Once done Dremio and Tableau would be connected, with Tableau sending live SQL queries to Dremio, and these queries being pushed down into Elasticsearch automatically.

Tableau connected to Dremio

3 A more manual approach would be to open Tableau and in Data Sources click on Other Sources.

Connecting to Dremio from Tableau through the ODBC connections dialog

4 In the Other Sources Dialog Box click on Dremio Connector that was created through ODBC and click connect. Provide the Dremio Login details and click Ok.

5 The Tableau Client will connect to the Dremio Server and show the available schemas.

6 Select the schema to work with and the table.The ES cluster is now connected to Dremio and ready for visualizations.

7 For example finding the number of records in every category.

8 Or if we need to see the distribution of categories, a pie chart would be useful and quickly created.

Pie chart in Tableau based on Elasticsearch data accessed via Dremio

Conclusion

As we can see Dremio helps us leverage the advantages of a powerful NoSql database like Elasticsearch and the dominant visualisation tool like Tableau.This is a great relief to Business Intelligence and Data Science alike as practitioners can concentrate less on building visualizations and more on the data exploration itself in today’s world where Data is the new Oil.

For further directions and use cases on working with Dremio kindly follow the below mentioned pages:

For interacting with the Dremio Team and post queries you can register at the dremio community: https://community.dremio.com