Enabling Data-as-a-Service for Postgres and SQL Server
A relational database, as the name implies, is a database based on a relational data model, perceived by the user as a set of normalized relationships of varying degrees. It is a collection of data elements organized in the form of a set of formally described tables from which data can be accessed or reassembled in many different ways without the need to reorganize the database tables.
In other words, the data is written into one or more tables (or “relations”). Each table consists of columns (attributes) and rows (records/tuples). Each table/relation demonstrates one entity. Each row we can identify by a unique key (primary key). To create a relation between tables we should create the foreign key.
In this tutorial, we will explain how to build the relational database and describe its elements. We will use two different sources, SQL-Server and Postgre, and show how to connect them to Dremio. Also, we will provide data curation using Dremio. Finally, we will visualize our data in Tableau.
We assume that you have Dremio and ODBC driver installed; if not, go to Dremio’s deployments page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with SQL-Server, Postgre, and Tableau, so you need to have them installed and configured as well.
Loading data into the sources
In this tutorial, we will use a sample dataset that contains supply chain data from a fictitious manufacturing plant. Let’s take a closer look. We have five entities (tables): Customer, Supplier, Product, Order, and OrderItem. At a first glance, it might seem odd to have two “Order” tables. Here’s why, suppose we combine these two tables into one. Then, when we go to a shop we can buy only one element in one record. In another case, we need to repeat order detail for each product. Consequently, it is bad practice and our database will not be normalized. To deal with this problem, we create two tables. Table Order contains purchase information (date, customer, etc.), and OrderItem contains a list of products which we purchase. Finally, by building a relationship between these two tables we can get accurate information for the purchase receipt.
Setting up SQL-Server
Let’s create a database. We will call it “factory”. After that, we need to create a query exactly in a database.
Next, we need to build a model. For this, we run the sample-model file from here. In the first part of this query, we check whether we have such tables or indexes. If yes, then we delete them. Next, we create tables and indexes. The last step, we create our foreign keys for building a relationship. As a result, we receive this schema:
Now, when our model is built we can insert our data using the sample-data file.
Let’s check whether our data is in. For example, check data in Order table:
The last step is optional, but it is important to check that you can work with the data. For example, by using the relation you can find out customer and his or her order number.
Also, you can check which products still haven’t been sold:
In Postgre you should change the identity to SERIAL and nvarchar to char. You should also separate the queries by “;”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 create table Customer ( Id SERIAL, FirstName char(40) not null, LastName char(40) not null, City char(40) null, Country char(40) null, Phone char(20) null, constraint PK_CUSTOMER primary key (Id) ); create index IndexCustomerName on Customer ( LastName ASC, FirstName ASC ); create table "Order" ( Id SERIAL, OrderDate date not null default current_date, OrderNumber char(10) null, CustomerId int not null, TotalAmount decimal(12,2) null default 0, constraint PK_ORDER primary key (Id) ); create index IndexOrderCustomerId on "Order" ( CustomerId ASC ); create index IndexOrderOrderDate on "Order" ( OrderDate ASC ); create table OrderItem ( Id SERIAL, OrderId int not null, ProductId int not null, UnitPrice decimal(12,2) not null default 0, Quantity int not null default 1, constraint PK_ORDERITEM primary key (Id) ); create index IndexOrderItemOrderId on OrderItem ( OrderId ASC ); create index IndexOrderItemProductId on OrderItem ( ProductId ASC ); create table Product ( Id SERIAL, ProductName char(50) not null, SupplierId int not null, UnitPrice decimal(12,2) null default 0, Package char(30) null, IsDiscontinued int not null default 0, constraint PK_PRODUCT primary key (Id) ); create index IndexProductSupplierId on Product ( SupplierId ASC ); create index IndexProductName on Product ( ProductName ASC ); create table Supplier ( Id SERIAL, CompanyName char(40) not null, ContactName char(50) null, ContactTitle char(40) null, City char(40) null, Country char(40) null, Phone char(30) null, Fax char(30) null, constraint PK_SUPPLIER primary key (Id) ); create index IndexSupplierName on Supplier ( CompanyName ASC ); create index IndexSupplierCountry on Supplier ( Country ASC ); alter table "Order" add constraint FK_ORDER_REFERENCE_CUSTOMER foreign key (CustomerId) references Customer (Id); alter table OrderItem add constraint FK_ORDERITE_REFERENCE_ORDER foreign key (OrderId) references "Order" (Id); alter table OrderItem add constraint FK_ORDERITE_REFERENCE_PRODUCT foreign key (ProductId) references Product (Id); alter table Product add constraint FK_PRODUCT_REFERENCE_SUPPLIER foreign key (SupplierId) references Supplier (Id);
To add, we need to transform our insert query to the following:
1 INSERT INTO Customer (Id, FirstName, LastName, City, Country, Phone ) VALUES (1, 'Maria', 'Anders', 'Berlin', 'Germany', '030-0074321’);
Also, date format differs in Postgre, that is why we should convert query with Order like shown below:
1 INSERT INTO `Order` ( Id , OrderDate, CustomerId, TotalAmount, OrderNumber) VALUES(1,'2012-07-04 12:00:00.000',85,440.00,'542378');
Finally, let’s check the data:
Connecting to Dremio
When data is imported we can connect our sources to Dremio. For that you should:
1. Log in to Dremio
2. Click “Add source”
3. Choose one of our sources
4. Fill in the necessary information.
Here you can see the sources that we need:
For both sources, you will need to type in a Data Source name, host (in our case we are using “localhost”), the database - “factory” and credential to your source. Take a look at our example of connecting Postgre to Dremio.
Once the connection is done, we can see our data sources:
Data Curation in Dremio
First, let’s create a space, where we will save the dataset that we are working with. We will name it “relational_databases”
Now we can do some data preparation. We want to combine the data which is very easy to do using Dremio. For that, choose one of the datasets, then click “Join”. Here, you will have two options: you can pick from a “Recommended Join” or create a “Custom Join”. In this tutorial, we will use a “Custom Join”, for that, we will choose another dataset.
Now you need to pick the type of join and condition. In our case, we need “Inner Join”, because our “Customer Id” in Order matches with “Id”. Customer Id is a foreign key, it means that we can get all information about the customer through it. So, Order table can only store those Ids which are in Customer table. But not all Ids from Customer must be in Order. That’s why we’ll do an “Inner Join”.
Also, using Dremio we can change data type as shown in the image below.
Dremio will automagically generate our final SQL script.
1 2 3 4 5 6 SELECT nested_0.Id AS Id, nested_0.OrderDate AS OrderDate, nested_0.OrderNumber AS OrderNumber, CONVERT_TO_INTEGER(nested_0.OrderNumber, 1, 1, 0) AS OrderNumber_1, nested_0.TotalAmount AS TotalAmount, nested_0.CustomerId AS CustomerId, join_customer.id AS id, join_customer.firstname AS firstname, join_customer.lastname AS lastname, join_customer.city AS city, join_customer.country AS country, join_customer.phone AS phone FROM ( SELECT Id, OrderDate, OrderNumber, CustomerId, TotalAmount FROM MSSS.factory.dbo."Order" ) nested_0 INNER JOIN Postgre.public.customer AS join_customer ON nested_0.CustomerId = join_customer.id
Let’s save the dataset to our previously created space.
Data visualization in Tableau
Now, let’s open dataset “data” that is inside the relational_databases space, click on Tableau tab and open the downloaded file. Sign in with your user and password to Dremio account.
Imagine, that you want to expand the selling spots of your products. So, it seems like a good idea to create a client map.
For that let’s move country, first name, last name, and phone on marks blank.
As a result, we can see the spots on a map.
Now, if you go to Dremio and click on the Job section, you can see the query that Dremio used to provide the data to Tableau to create the map.
Relational data sources help us save data properly, and using Dremio we can combine data from several data sources, regardless of where are they located, perform the necessary data preprocessing and then use any BI client to, analyze, discover patterns and other insights in our data at the speed of thought.
We hope you enjoyed this tutorial, stay tuned for more tutorials and resources to learn how you can use Dremio to start gaining insights from your data, faster.