Dremio Jekyll

Analyzing MongoDB with Qlik Sense

Intro

Qlik Sense is a business intelligence tool for sophisticated visualizations and data exploration. It is one of the leaders in the business analytics because it can connect to different data sources at the same time, work from any device, and has convenient navigation and an easy-to-use visualization interface.

Today we will show you how to make Qlik work with one of the most popular data sources, MongoDB, a fast document-oriented NoSQL database. It has flexible data schema, an understandable structure of each object, and excellent scalability. Use of Qlik with MongoDB is limited because data stored in MongoDB lacks structure and requires additional data manipulation. Luckily, Dremio is able to connect Qlik to your MongoDB clusters and accelerates your data and queries easily.

Assumptions

We assume that you have Dremio and ODBC driver installed; if not, go to Dremio’s download page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will work with MongoDB and Qlik Sense Desktop, so you need to have them installed and configured.

Importing data into MongoDB

In this tutorial, we will visualize and analyze earthquake data from all around the world. Note that the data is in JSON format which is not preferable for Qlik Sense, but Dremio reads JSON and can can convert it into a format that Qlik can understand, simplifying the data connection process and and allowing further analysis.

First, we need to import the dataset into MongoDB. If you are on Windows, from the command line navigate to bin directory of MongoDB and start it with the following command:

1
mongod.exe

Otherwise if you are using Mac OSX, run the following command:

1
mongod

Then open another tab of your terminal and from the bin directory run the built-in function mongoimport:

1
mongoimport -d dremio -c earthquakes --type json --file <PATH TO FILE>/earthquakes.json

Note: the database dremio with collection earthquakes will be created automatically. Now we are ready to connect Dremio to MongoDB.

Data curation in Dremio

After you have started Dremio and navigated to the Dremio UI at http://localhost:9047, create a new space by clicking on the respective button on the left side of the page and name it MongoData. Then click Add Source and choose MongoDB as a type.

Now, select name, specify the host (if you are running MongoDB on your local machine, enter localhost) and address, pick No Authentication, and click Save.

create new mongo source

After that, you will be able to see the databases and collections that you have in MongoDB in the Sources menu.

data sources

Next, navigate to the Dremio folder and click on earthquakes to see the data in that file. It consists of 13 columns and over 38,000 rows that describe different earthquakes throughout the whole year. Now we need to prepare our data for further analysis with Qlik Sense, and Dremio Editor is a perfect place to make all necessary arrangements.

dremio editor

To begin, let’s drop the _id and eventID columns that represent unique cases and NBstations column as most of the values it contains are null values. To make this change, simply click the small arrow near the column name and select Drop in the drop-down menu.

drop column

After that, change the data types of the rest of the values so that they are suitable for Qlik Sense.

  • For the DateTime column click on the Abc near the column name and select Date&Time format. Then enter the custom pattern YYYY/MM/DD” “HH24:MI:SS.FFF, toggle Drop Source Field option and click Apply.

datetime

  • Columns Depth, Magnitude, RMS, Longitude, and Latitude need to be converted to a float type. Again, click an arrow near the column name and select Convert to a single data type. Then pick Float as a desired type and toggle Cast when possible.

convert float

After those preparations, the data is ready. Notice, that we were using user-friendly interface of SQL that Dremio provides; however, SQL editor automatically writes the query and, in our case, it looks like following:

1
2
3
SELECT TO_TIMESTAMP(TO_CHAR(TO_TIMESTAMP(DateTime, 'YYYY/MM/DD" "HH24:MI:SS.FFF', 1), 'YYYY-MM-DD HH24:MI:SS'), 'YYYY-MM-DD" "HH24:MI:SS', 1) AS DateTime, clean_data_to_FLOAT(Longitude, 1, 0, 0) AS Longitude, clean_data_to_FLOAT(Depth, 1, 0, 0) AS Depth, clean_data_to_FLOAT(Magnitude, 1, 0, 0) AS Magnitude, MagType, Gap, Distance, Source, clean_data_to_FLOAT(RMS, 1, 0, 0) AS RMS, clean_data_to_FLOAT(Latitude, 1, 0, 0) AS Latitude
FROM Earth.dremio.earthquakes
WHERE (is_clean_data(Depth, 1, 'FLOAT')) AND (is_clean_data(Magnitude, 1, 'FLOAT')) AND (is_clean_data(RMS, 1, 'FLOAT')) AND (is_clean_data(Latitude, 1, 'FLOAT')) AND (is_clean_data(Longitude, 1, 'FLOAT'))

Now click Save as and pick the previously created MongoData location. save

Loading data into Qlik Sense

First, Dremio requires Qlik to be running, so start Qlik Sense Desktop from your computer. Then click on Analyze button at the top right corner of the toolbar and select Qlik Sense.

run qlik

At this point, you will need to enter your Dremio password and wait for establishing the connection to Qlik Sense. We will work with Qlik Sense in the browser, so follow the link provided.When you open Qlik Sense, you will see the window below.

data load editor

Click Open to go to the data load editor; however, the automatically provided code should be changed. To insert the right script and load the data correctly, follow the next steps.

  • On the right side of the page, you will find Edit connection icon. Make sure that the connection is set for SystemDSN and right version of bit for your machine.

create new connection

  • Click on the Select data icon left to the Edit Connection. Select MongoData in Owner menu. Preview if the data is visible and displays correctly and click on Insert Script. The selection of Include LOAD statement is necessary.

edit connection

After that the Data load Editor looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SET DirectIdentifierQuoteChar='"';

LIB CONNECT TO 'Dremio';

LOAD DateTime,
    Longitude,
    Depth,
    Magnitude,
    MagType,
    Gap,
    Distance,
    Source,
    RMS,
    Latitude;

SQL SELECT DateTime,
    Longitude,
    Depth,
    Magnitude,
    MagType,
    Gap,
    Distance,
    Source,
    RMS,
    Latitude
FROM Dremio."MongoData"."earthquakes";
  • Click on Load data in the top right corner and wait until the process is complete and finished successfully. loaddata

Building visualizations

From the data load editor window, navigate to App overview at the top left corner. app overview

Then select Sheet and click Edit at the top of the page. You will now see the primary place for your work in Qlik Sense. On the left side of the screen you can see following menus:

  • plenty of available charts,
  • custom objects that include extensions that you can additionally upload,
  • master items, in particular, new dimensions and measures that you can create from the existing fields in your data, and your saved visualization,
  • fields that you have on your tables.

Now it’s time to build some basic visualizations. Let’s start with a histogram. Drag the corresponding icon from the chart list to the editor area and add Magnitude as a field. On the right side, there is a Properties panel where you can play with settings of the particular chart and its appearance. For example, we changed the bar color and set the number of bars to 14.

histogram

Next we will make a scatterplot. Again, drag the icon and make the following selections:

  • Dimension - Source
  • Measure - Avg(Distance)
  • Measure - Avg(Gap)

In the Properties->Data we also added the third measure - Count(Source) that will adjust the size of points - the more earthquakes were in the particular source, the bigger the point.

In Appearance, we can find many tools for making plots better looking and more informative.

For instance, we will choose diverging gradient for a color scheme which basically visualizes the average magnitude for every source.

scatterplot

And, of course, for this dataset, we should create a map. First, we need to create geographic points to display our data on the map. Go to Master items on the left and create a new dimension called GeoPoint. There is a GeoMakePoint function that will create a two-dimensional point from Latitude and Longitude fields. Click Add dimension and then click Done.

add dimensions

However, there are over 38,000 observations, and it will be better to select a smaller amount for informative map visualization. Let’s choose the earthquakes with magnitudes greater than or equal to 6. Go back to Dremio and for the Magnitude column select Keep Only. Set the lower limit to 6 and click Apply.

filter data

Then refresh the data in Qlik Sense in the Fields menu. refresh data

When the app is saved, drag the Map chart to the editor and select GeoPoint as a dimension. Again, there are many additional properties available. We added Distance as a measure in Data menu and changed the limits of the bubble size. There are also some specific arrangements like map service that allow you to choose any map you want.

map

Conclusion

In this tutorial, we showed you how you can use Dremio to make Qlik work with MongoDB easily. In addition to avoiding connection limitations, Dremio accelerates the analytics of Qlik and transforms raw JSON data into a structured format that we can run SQL queries on Moreover, data manipulations made in Dremio don’t require any programming skills or knowledge and can be easily accessed with Qlik running at the same time.