Dremio Jekyll

Enabling Data-as-a-Service for Postgres and SQL Server

Dremio

Intro

A relational database, as the name implies, is a database based on a relational data model, perceived by the user as a set of normalized relationships of varying degrees. It is a collection of data elements organized in the form of a set of formally described tables from which data can be accessed or reassembled in many different ways without the need to reorganize the database tables.

In other words, the data is written into one or more tables (or “relations”). Each table consists of columns (attributes) and rows (records/tuples). Each table/relation demonstrates one entity. Each row we can identify by a unique key (primary key). To create a relation between tables we should create the foreign key.

In this tutorial, we will explain how to build the relational database and describe its elements. We will use two different sources, SQL-Server and Postgre, and show how to connect them to Dremio. Also, we will provide data curation using Dremio. Finally, we will visualize our data in Tableau.

Assumptions

We assume that you have Dremio and ODBC driver installed; if not, go to Dremio’s deployments page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with SQL-Server, Postgre, and Tableau, so you need to have them installed and configured as well.

Loading data into the sources

In this tutorial, we will use a sample dataset that contains supply chain data from a fictitious manufacturing plant. Let’s take a closer look. We have five entities (tables): Customer, Supplier, Product, Order, and OrderItem. At a first glance, it might seem odd to have two “Order” tables. Here’s why, suppose we combine these two tables into one. Then, when we go to a shop we can buy only one element in one record. In another case, we need to repeat order detail for each product. Consequently, it is bad practice and our database will not be normalized. To deal with this problem, we create two tables. Table Order contains purchase information (date, customer, etc.), and OrderItem contains a list of products which we purchase. Finally, by building a relationship between these two tables we can get accurate information for the purchase receipt.

Setting up SQL-Server

Let’s create a database. We will call it “factory”. After that, we need to create a query exactly in a database.

Next, we need to build a model. For this, we run the sample-model file from here. In the first part of this query, we check whether we have such tables or indexes. If yes, then we delete them. Next, we create tables and indexes. The last step, we create our foreign keys for building a relationship. As a result, we receive this schema:

image alt text

Now, when our model is built we can insert our data using the sample-data file.

image alt text

Let’s check whether our data is in. For example, check data in Order table:

image alt text

The last step is optional, but it is important to check that you can work with the data. For example, by using the relation you can find out customer and his or her order number.

image alt text

Also, you can check which products still haven’t been sold:

image alt text

Postgre

In Postgre you should change the identity to SERIAL and nvarchar to char. You should also separate the queries by “;”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
create table Customer (
   Id                   SERIAL,
   FirstName            char(40)         not null,
   LastName             char(40)         not null,
   City                 char(40)         null,
   Country              char(40)         null,
   Phone                char(20)         null,
   constraint PK_CUSTOMER primary key (Id)
);
create index IndexCustomerName on Customer (
LastName ASC,
FirstName ASC
);
create table "Order" (
   Id                   SERIAL,
   OrderDate            date             not null default current_date,
   OrderNumber          char(10)         null,
   CustomerId           int                  not null,
   TotalAmount          decimal(12,2)        null default 0,
   constraint PK_ORDER primary key (Id)
);
create index IndexOrderCustomerId on "Order" (
CustomerId ASC
);
create index IndexOrderOrderDate on "Order" (
OrderDate ASC
);
create table OrderItem (
   Id                   SERIAL,
   OrderId              int                  not null,
   ProductId            int                  not null,
   UnitPrice            decimal(12,2)        not null default 0,
   Quantity             int                  not null default 1,
   constraint PK_ORDERITEM primary key (Id)
);
create index IndexOrderItemOrderId on OrderItem (
OrderId ASC
);
create index IndexOrderItemProductId on OrderItem (
ProductId ASC
);
create table Product (
   Id                   SERIAL,
   ProductName          char(50)         not null,
   SupplierId           int                  not null,
   UnitPrice            decimal(12,2)        null default 0,
   Package              char(30)         null,
   IsDiscontinued       int                  not null default 0,
   constraint PK_PRODUCT primary key (Id)
);
create index IndexProductSupplierId on Product (
SupplierId ASC
);
create index IndexProductName on Product (
ProductName ASC
);
create table Supplier (
   Id                   SERIAL,
   CompanyName          char(40)         not null,
   ContactName          char(50)         null,
   ContactTitle         char(40)         null,
   City                 char(40)         null,
   Country              char(40)         null,
   Phone                char(30)         null,
   Fax                  char(30)         null,
   constraint PK_SUPPLIER primary key (Id)
);
create index IndexSupplierName on Supplier (
CompanyName ASC
);
create index IndexSupplierCountry on Supplier (
Country ASC
);
alter table "Order"
   add constraint FK_ORDER_REFERENCE_CUSTOMER foreign key (CustomerId)
      references Customer (Id);
alter table OrderItem
   add constraint FK_ORDERITE_REFERENCE_ORDER foreign key (OrderId)
      references "Order" (Id);
alter table OrderItem
   add constraint FK_ORDERITE_REFERENCE_PRODUCT foreign key (ProductId)
      references Product (Id);
alter table Product
   add constraint FK_PRODUCT_REFERENCE_SUPPLIER foreign key (SupplierId)
      references Supplier (Id);

To add, we need to transform our insert query to the following:

1
INSERT INTO Customer (Id, FirstName, LastName, City, Country, Phone ) VALUES (1, 'Maria', 'Anders', 'Berlin', 'Germany', '030-0074321’);

Also, date format differs in Postgre, that is why we should convert query with Order like shown below:

1
INSERT INTO  `Order`  ( Id , OrderDate, CustomerId, TotalAmount, OrderNumber) VALUES(1,'2012-07-04 12:00:00.000',85,440.00,'542378');

Finally, let’s check the data:

image alt text

Connecting to Dremio

When data is imported we can connect our sources to Dremio. For that you should:

1. Log in to Dremio

2. Click “Add source”

3. Choose one of our sources

4. Fill in the necessary information.

Here you can see the sources that we need:

image alt text

For both sources, you will need to type in a Data Source name, host (in our case we are using “localhost”), the database - “factory” and credential to your source. Take a look at our example of connecting Postgre to Dremio.

image alt text

Once the connection is done, we can see our data sources:

image alt text

Data Curation in Dremio

First, let’s create a space, where we will save the dataset that we are working with. We will name it “relational_databases”

image alt text

Now we can do some data preparation. We want to combine the data which is very easy to do using Dremio. For that, choose one of the datasets, then click “Join”. Here, you will have two options: you can pick from a “Recommended Join” or create a “Custom Join”. In this tutorial, we will use a “Custom Join”, for that, we will choose another dataset.

image alt text

Now you need to pick the type of join and condition. In our case, we need “Inner Join”, because our “Customer Id” in Order matches with “Id”. Customer Id is a foreign key, it means that we can get all information about the customer through it. So, Order table can only store those Ids which are in Customer table. But not all Ids from Customer must be in Order. That’s why we’ll do an “Inner Join”.

image alt text

Also, using Dremio we can change data type as shown in the image below.

image alt text image alt text

Dremio will automagically generate our final SQL script.

1
2
3
4
5
6
SELECT nested_0.Id AS Id, nested_0.OrderDate AS OrderDate, nested_0.OrderNumber AS OrderNumber, CONVERT_TO_INTEGER(nested_0.OrderNumber, 1, 1, 0) AS OrderNumber_1, nested_0.TotalAmount AS TotalAmount, nested_0.CustomerId AS CustomerId, join_customer.id AS id, join_customer.firstname AS firstname, join_customer.lastname AS lastname, join_customer.city AS city, join_customer.country AS country, join_customer.phone AS phone
FROM (
 SELECT Id, OrderDate, OrderNumber, CustomerId, TotalAmount
 FROM MSSS.factory.dbo."Order"
) nested_0
INNER JOIN Postgre.public.customer AS join_customer ON nested_0.CustomerId = join_customer.id

Let’s save the dataset to our previously created space.

image alt text

Data visualization in Tableau

Now, let’s open dataset “data” that is inside the relational_databases space, click on Tableau tab and open the downloaded file. Sign in with your user and password to Dremio account.

Imagine, that you want to expand the selling spots of your products. So, it seems like a good idea to create a client map.

For that let’s move country, first name, last name, and phone on marks blank.

image alt text

As a result, we can see the spots on a map.

image alt text

Now, if you go to Dremio and click on the Job section, you can see the query that Dremio used to provide the data to Tableau to create the map.

image alt text

Conclusion

Relational data sources help us save data properly, and using Dremio we can combine data from several data sources, regardless of where are they located, perform the necessary data preprocessing and then use any BI client to, analyze, discover patterns and other insights in our data at the speed of thought.

We hope you enjoyed this tutorial, stay tuned for more tutorials and resources to learn how you can use Dremio to start gaining insights from your data, faster.