Dremio Jekyll

Prototyping Financial Data Warehouse and Reporting Systems using Dremio

Introduction

I’d like to review two scenarios of implementing a financial data warehouse and reporting system for the hypothetical “Sample Bank Inc.” banking group. The main idea is to compare two approaches - the first one uses a relational database as a data warehouse platform and the second one has Dremio in the same role. General architecture assumes that we have:

Production accounting OLTP system as a source:

  • Middle-tier data warehouse
  • Endpoint reporting system

The following schemas explain both scenarios:

Prototyping Financial Database Systems with Dremio

  • A Classic approach uses a relational data warehouse to process data from the enterprise application, and a BI system to produce reports and ad-hoc analysis.

Prototyping Financial Database Systems with Dremio

  • The Second scenario uses Dremio for the data processing and a BI system to produce reports and ad-hoc analysis.

Technical specifications

Both scenarios were tested within one virtual instance running on Google Cloud:

  • Machine type: n1-standard-4 (4 vCPUs, 15 GB memory)
  • OS: Red Hat Enterprise Linux 7
  • Disc size: 50GB

Installed software:

  • Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production.
  • WebLogic Server 12.2.1.3.0
  • Oracle Business Intelligence 12c (12.2.1.3.0)
  • Dremio 1.3.0 community edition
  • Oracle Java SE 8u152

Ready-to-use virtual machine image

The virtual machine image with installed software can be imported into the Images and then launched from the following location:

http://storage.googleapis.com/oracle-dremio-obiee/oracle-dremio-obiee.tar.gz

Itemized instruction for how to do this can be found here:

Importing Boot Disk Images to Compute Engine

Virtual machine name must be ‘oracle-obiee-dremio’!

This image was exported from the Google VM and has a complete and fully working environment with the sample data. Software is running on the following paths:

Dremio

1
http://<External IP>:9047

OBIEE

1
http://<External IP>:9502/analytics

WebLogic Server Enterprise manager

1
http://<External IP>:9500/em

WebLogic Server Administration console

1
http://<External IP>:9500/console

Also it is required to open a number of TCP ports for the inbound connections, i.e. configure inbound Firewall rules:

System Port
Dremio 9047
Oracle net service 1521
OBIEE 9502
OBIEE Administration tool 9514
OBIEE console 9500

User authentication information for the installed systems:

System Login Password
Dremio test Admin123
Oracle SYS OraPasswd1
Oracle OPERATIONAL pass
Oracle DWH pass
OBIEE weblogic Admin123
OBIEE Repository Admin123

Also it is necessary to bind OBIEE to a new machine name (hostname -f command). There are several steps required to be completed after the first startup:

  1. Login as an oracle user:
1
sudo su - oracle
  1. Update Node Manager listen address in:
1
2
$MW_HOME/user_projects/domains/bi/nodemanager/nodemanager.properties
ListenAddress=[full machine name]
  1. Start OBIEE services:
1
$DOMAIN_HOME/bitools/bin/start.sh
  1. Configure the Node Manager listen address via the Weblogic Console:

[ Environment > Machines > [machine name] > Configuration tab > Node Manager sub-tab > Listen Address ] Listen Address = [full machine name]

  1. Update the Weblogic Administration Server Listen Address according to the Weblogic documentation:

[ Weblogic console > Environment > Servers > AdminServer > Configuration > General ]

  1. Update the Weblogic Managed Server Listen Address according to the Weblogic documentation:

[ Weblogic console > Environment > Servers > Managed Server > Configuration > General ]

  1. Update the OBIEE System components in:
1
$MW_HOME/user_projects/domains/bi/config/fmwconfig/bienv/core/bienv-components.xml

Add [your hostname.domain or IP ] under each componentType

Note: These steps must be followed explicitly. The listenAddress is case-sensitive, it should be entered for each and every component, and the location of the entry is fixed and must be entered directly after XXXX; otherwise, you will encounter XML parsing errors.

  1. Restart all the components:
1
2
$MW_HOME/user_projects/domains/bi/bitools/bin/stop.sh
$MW_HOME/user_projects/domains/bi/bitools/bin/start.sh

After resetting VM it takes about 10 minutes to startup all installed software.

Installation notes

A detailed tutorial on how to install an Oracle database and Dremio on the Google Compute service can be found here.

Also this post provides a full explanation of the WebLogic and Oracle BI installation process. It can be used in conjunction with the previous tutorial, starting from the Response Files chapter.

The last thing, which is not covered in listed tutorials, is how to configure the Dremio JDBC driver within Oracle WebLogic server. It is a necessary step before importing Dremio metadata into the Oracle BI Repository.

Configure JDBC drivers within Oracle Weblogic

  1. Open terminal window and login as oracle user into the VM

  2. Copy Dremio JDBC driver .jar file to the WebLogic lib directory

1
2
cp <DREMIO_HOME>/jars/jdbc-driver/dremio-jdbc-driver-<version>.jar 
$MW_HOME/user_projects/domains/bi/lib/
  1. Open WebLogic console on:
1
//<External IP>:9500/console
  1. Choose Lock & Edit in Change Center menu

  2. Choose Data Sources in Domain Structure menu

  3. Choose New -> Generic data Source under the Data Sources

Prototyping Financial Database Systems with Dremio

  1. Enter parameters:

Name - Dremio

JNDI Name - Dremio

Database Type - Other

Prototyping Financial Database Systems with Dremio

  1. Then two pages follow, where default parameters do not require any changes

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

  1. Enter Dremio username and password. User is called test in the provided VM image.

Prototyping Financial Database Systems with Dremio

  1. Enter parameters and choose Finish:

Driver class name - com.dremio.jdbc.Driver

URL - jdbc:dremio:direct=localhost:31010

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

  1. Select bi_cluster as a deployment target and choose Finish

Prototyping Financial Database Systems with Dremio

  1. Select Deployments under Domain Structure Prototyping Financial Database Systems with Dremio

  2. Find obi.datasrc.server

Prototyping Financial Database Systems with Dremio

  1. Enable bi_cluster (leave the bi_internal_virtualhost1 for next step). Save and Activate changes under Change Center menu

Prototyping Financial Database Systems with Dremio

  1. Press Lock & Edit again disable the bi_internal_virtualhost1

  2. Choose Save and Activate Changes again

Now it is possible to Load Java Datasources in the Oracle BI Administration tool.

Operational database

Annotation

The operational database is a production OLTP system, which is the subject for data analysis tasks. In the real scenario it is a standalone server or virtual machine with the database management software. In this tutorial a sample accounting database serves as a source system for further analytic processing.

The following basic business requirements were considered when designing the sample database:

  1. System should have the following entities:
    • Client
    • Contract
    • Personal account
    • Balance account
    • Region
    • Transaction
    • Balance
  2. Personal account and Balance account determine the hierarchical analytic attribute for transactions and balance records - Account.

  3. Transaction record has the following attributes:
    • Two accounts, according to the double-entry booking principle
    • Date and Time
    • Amount
  4. Balance record has the following attributes:
    • One account
    • Date
    • Amount
  5. Contract reflects bank products such as Loan, Deposit etc. and can have several related personal accounts

  6. Client can have several contracts

  7. Region is a client attribute

  8. Chart of Balance accounts:
Code (just a number) Account name Account type
20202 Cash Active
45201 Loans to legal entities Active
45502 Loans to individuals Active
42101 Current funds of legal entities Passive
42301 Current funds of individuals Passive
42102 Deposits to legal entities Passive
42302 Deposits to individuals Passive
10206 Own funds Passive
70601 Incomes Passive
70602 Expenses Active

Relational database schema has a 3rd normal form and it is based on two core fact tables - Balance and Transaction, other entities are dimensions in terms of the Star-schema design. To be more precise, they become dimensions after denormalization when designing data warehouse schema.

Let us have a look at the relational schema.

Relational schema

Prototyping Financial Database Systems with Dremio

The design is pretty straightforward, but it has several additional attributes and therefore some comments would be useful:

  1. No Indexing Strategy, except primary keys. The reason is that in fact the database is only used as the source system for the data warehouse.

  2. Balance accounts are used for the personal accounts aggregation and classification

  3. Client type is an additional analytic attribute

Operational database installation

Installation scripts and dump with the sample data for the operational database can be downloaded from the public Google storage bucket:

http://storage.googleapis.com/oracle-dremio-obiee/operational/operational.7z

Data model for the Oracle SQL Developer data Modeler can be found here:

http://storage.googleapis.com/oracle-dremio-obiee/operational/ModelOperational.7z

To unzip 7z files the following package must be installed:

1
sudo yum install p7zip

Download archive under the oracle user

1
sudo su - oracle
1
curl -O [http://storage.googleapis.com/oracle-dremio-obiee/operational/operational.7z](http://storage.googleapis.com/oracle-dremio-obiee/operational/install.sql)

Extract files into the desired folder preserving full paths, for example into install:

1
7za x -oinstall operational.7z

Archive contains the following files:

  1. install.sql Main installation script

  2. create_user.sql Create OPERATIONAL schema

  3. create_operational_objects.sql Create OPERATIONAL schema objects

  4. insert_data.sql Insert data into tables REGION (US States) and CLIENT (randomly generated individual’s and company’s names)

  5. contract.dmp Dump file for the CONTRACT table, exported with exp utility

  6. personal_account.dmp Dump file for the PERSONAL_ACCOUNT table

  7. balance.dmp Dump file for the BALANCE table

  8. transaction.dmp Dump file for the TRANSACTION table

  9. dump.sh Bash script for importing .dmp files

To install operational database execute commands:

1
cd install/operational
1
sqlplus / as sysdba @install.sql
1
bash dump.sh

Output log is stored in install.log file.

Sample operational data generation

In order to test both scenarios it is required to fill the operational database with the sample data.

Tables REGION and CLIENT were populated as random static dictionaries, and data for other tables were generated by the special procedure with the following steps:

  1. Assume that the date span is Year 2016.

  2. Populate REGION table with USA States

  3. Populate CLIENT table with random person and company names

  4. Open accounts for Bank’s own funds, i.e. insert several records into ACCOUNT table.

  5. Enroll $ 1 billion as own funds.

  6. Generate transactions for 500 Deposits and 500 Loans each month of the year 2016 into the TRANSACTION table. Transaction amount and interest rate are random, 12 months terms are identical for all contracts.

  7. Records for tables CONTRACT and PERONAL_ACCOUNT are generated automatically in previous step.

  8. Generate interest transactions for each contract, i.e. 12 transactions per contract

  9. Calculate everyday balance records for each client into BALANCE table.

As a result, we have about 800 thousands transaction records and 6.5 billion balance records for one year. From the business point of view, it could be a bank in 200th place of US Banks ranked by assets. We are considering only those transactions, which affect profit/loss reports.

Reporting requirements

Usually reporting requirements for the system are produced by financial analysts or directly by the management team. It can be a set of predefined regular reports or ad-hoc analysis requests. The data warehouse and data model are therefore developed using these business requirements as a target.

In this tutorial both types of reports are reviewed.

Regular annual report

System must provide regular annual report with the following form:

Prototyping Financial Database Systems with Dremio

Of course it is just a short basic form, containing several analytic hierarchical accounts for the Balance and Profit/Loss reports.

Ad-hoc analysis

System must provide analysis capabilities, such as building desired pivot reports with the following measures:

  • Balance amount
  • Transaction amount

and the following dimensions:

  • Client
  • Region
  • Contract
  • Analytic accounts

For example, here is the simple Top 10 Loans report:

Prototyping Financial Database Systems with Dremio

Common data warehousing techniques

To implement reporting requirements several common techniques were used when developing both scenarios:

Operational database offloading

Usually reporting and analytic queries are time and resource consuming, therefore we need a separate database as a source for such type of user activities.

Star-schema design and fact tables denormalization

According to Wikipedia, the star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. In our case, we have two fact tables - BALANCE and TRANSACTION, and several dimensions. Since not all of them are joined directly to the dimension tables, denormalization is required by adding redundant copies of data to the BALANCE and TRANSACTION tables. This transformation is covered later more precisely.

Aggregate fact tables

As a rule, data warehouse fact tables, which reflect operational tables, store detailed data such as transactions or balance records for personal accounts. But most of the management reports contain aggregate information, like monthly revenue by regions or by analytical accounts. In order to optimize read performance and therefore queries time, data is stored in aggregated form. For example, to produce sample Annual report, two additional tables can be used. They store aggregated balance and transaction amounts, and should have only Date, Account and Amount fields.

Data enrichment

Data enrichment is a general term that refers to processes used to enhance, refine or otherwise improve raw data. For example, we are using dimensions analytical accounts for the balance and profit/loss reports, which are not present in the source accounting system. These hierarchical accounts are related to the source balance accounts, but are maintained only in the data warehouse.

ETL procedure

Since the separate database is required, it should be populated with an operational data on a scheduled basis. In this tutorial we need to extract data from the operational database, transform it into the denormalized and aggregated form and load into the data warehouse.

Scenario 1: Using Oracle database

Oracle data warehouse

The first scenario uses Oracle database as a data warehouse platform. Summarizing techniques, mentioned in the previous chapter, and reporting requirements, the following relational schemas are proposed (some dimension tables are shown twice just for better presentation structure):

Balance star-schema

Prototyping Financial Database Systems with Dremio

Here is an overview of data-warehousing techniques, used to build Balance schema:

  • BALANCE table has redundant CLIENT_ID, CONTRACT_ID, BALANCE_ACCOUNT_CODE and REGION_ID fields
  • BALANCE_AGGREGATE table store summarized balance by BALANCE_ACCOUNT_CODE and DATE
  • BALANCE_ANALYTIC_ACCOUNT is an external table, added to meet reporting requirements. It stores relations between the accounting balance accounts and reporting hierarchical accounts.
  • CALENDAR is an external table, added to provide additional Calendar dimension.
  • ETL procedures can be found in fill_dwh.sql file, which is a part of Installing DWH database chapter. Although in this tutorial ETL queries are run only once to fill data warehouse, in real systems they would be transformed into regular scheduled procedures.

Transaction star-schema

Prototyping Financial Database Systems with Dremio

Here is an overview of data-warehousing techniques, used to build Transaction schema:

  • TRANSACTION table has redundant CLIENT_ID, CONTRACT_ID, DEB_BAL_ACCOUNT_CODE, CRD_BAL_ACCOUNT_CODE and REGION_ID fields
  • TRANSACTION table does not have foreign keys to the PL_ANALYTIC_ACCOUNT table, because not all transactions are affecting profit/loss reports, and hence not all balance account pairs are presented in the PL_ANALYTIC_ACCOUNT.
  • TRANSACTION_AGGREGATE table store summarized transactions amount by DEB_BAL_ACCOUNT_CODE, CRD_BAL_ACCOUNT_CODE and MONTH_NAME
  • CALENDAR_MONTH is an external table, added to provide additional Month dimension to build aggregation reports.
  • PL_ANALYTIC_ACCOUNT is an external table, added to meet reporting requirements. It stores relations between the pairs of balance accounts and reporting hierarchical accounts.

ETL procedures can be found in fill_dwh.sql file, which is a part of Installing DWH database chapter

Data warehouse installation

Installation scripts for the data warehouse can be downloaded from the public Google storage bucket:

http://storage.googleapis.com/oracle-dremio-obiee/dwh/dwh.7z

Data model for the Oracle SQL Developer data Modeler can be found here:

http://storage.googleapis.com/oracle-dremio-obiee/dwh/ModelDataWarehouse.7z

To unzip 7z files the following package must be installed

1
sudo yum install p7zip

Download archive under the oracle user

1
sudo su - oracle
1
curl -O [http://storage.googleapis.com/oracle-dremio-obiee/dwh/dwh.7z](http://storage.googleapis.com/oracle-dremio-obiee/operational/install.sql)

Extract files into the desired folder preserving full paths, for example into install:

1
7za x -oinstall dwh.7z

Archive contains the following files:

  1. install.sql Main installation script

  2. create_user.sql Create DWH schema

  3. create_dwh_objects.sql Create DWH schema objects

  4. insert_data.sql Insert data into BALANCE_ANALYTIC_ACCOUNT, PL_ANALYTIC_ACCOUNT, CALENDAR and CALENDAR_MONTH tables.

  5. fill_dwh.sql Script contains ETL queries, which are used to copy data from the operational dimension tables and to transform fact tables into the denormalized form.

To install dwh database execute commands:

1
cd install/dwh
1
sqlplus / as sysdba @install.sql

Output log is stored in install.log file.

OBIEE reporting system overview

The last step, which is required to build a working system, is reporting and analytic software configuring. This tutorial gives a brief explanation how to set up Oracle Business Intelligence, create data model and produce reports upon business requirements.

Oracle BI provides a lot of data modeling and reporting capabilities, supports many databases as source systems and has a comprehensive documentation. In the next chapters it is assumed that user has some basic skills of working with the OBIEE.

Producing reports within Oracle BI **consists of the two main stages - creating **repository (i.e. data model) and dashboards (reports). Both scenarios are using the same repository and almost identical dashboards for the Annual report 2016 and Top 10 clients by loans. Therefore it is sufficient to create repository and dashboards for the Oracle data warehouse, and then small amendments are required to adopt it for the Dremio.

Create a Repository

First we need to create a Repository - it is a file containing metadata about physical data sources, business data model and representation objects (fact and dimension tables used in reports). The repository used in this tutorial can be downloaded here:

http://storage.googleapis.com/oracle-dremio-obiee/OBIEE/BI.rpd

For working with .rpd files Oracle Business Intelligence Developer Client Tool is needed. Although the detailed explanation of all steps required to create this file may take too much place, it would be useful to clarify key moments:

  1. This tool has only Windows version, so as a rule it is running on client workstation and is connected to the cloud sources using tcp protocol.

  2. To open downloaded BI.rpd file choose File->Open->Offline

  3. Repository has three layers - physical, business model and representation.

  4. Physical layer has two Databases - one for the Oracle DWH (i.e. Scenario #1) and one for the Dremio (i.e. Scenario #2). Each of them has one Connection pool.

Prototyping Financial Database Systems with Dremio

  1. During the development process, the Data source name for the Oracle connection pool should contain public IP address of the virtual machine. But before uploading repository to the server it must be changed to the localhost, because all systems are installed on the same VM.

Prototyping Financial Database Systems with Dremio

  1. Business model layer also has two models - Sample Bank Inc. DWH (scenario #1) and Sample Bank Inc. (scenario #2). Objects in this layer are the next abstraction level after physical data sources. They can have several physical data sources, different aggregation functions for different dimensions etc., and present the logical business model. Since the reporting requirements do not include information from all physical tables, only part of them are transferred from the physical layer:

Prototyping Financial Database Systems with Dremio

  1. Fact tables and Calendar, unlike other dimensions, have two data sources - one for the detail reports and one for the aggregated.

Prototyping Financial Database Systems with Dremio

  1. Balance table, unlike Transaction, has different aggregate functions for the Amount measure. Since the account balance is measured in the time moment, it can not be summarized by the time.

Prototyping Financial Database Systems with Dremio

  1. Calendar dimension must be tagged as a Time to allow using of time-based aggregation functions like LAST.

Prototyping Financial Database Systems with Dremio

  1. Presentation layer is a set of objects from a certain Business model, which are displayed when building reports. It has two subject areas - Sample Bank Inc. DWH (scenario #1) and Sample Bank Inc. (scenario #2). In our case presentation layer is just a copy of dimensions and fact tables from the Business model.

Prototyping Financial Database Systems with Dremio

  1. To transfer a repository from the client machine to the OBIEE server the following steps are needed:

    • Upload .rpd file from the local computer to the VM, for example to the /home/oracle/install/BI.rpd
    • cd $MW_HOME/user_projects/domains/bi/bitools/bin
    • ./datamodel.sh uploadrpd -I /home/oracle/install/BI.rpd -SI ssi -U weblogic -P Admin123 -S localhost -N 9502

Dashboards and analyses

  1. After uploading a repository to the OBIEE server, dashboards for the “Regular annual report” and “Top 10 Loans” can be downloaded from the following location:

http://storage.googleapis.com/oracle-dremio-obiee/OBIEE/Sample Bank Inc. annual report 2016 (Scenario 1).catalog

http://storage.googleapis.com/oracle-dremio-obiee/OBIEE/Top 10 Loans (Scenario 1).catalog

Then files should be unarchived, for example, into the /Shared Folder/Sample bank/Dashboards location Prototyping Financial Database Systems with Dremio

  1. Sample Bank Inc. annual report 2016 includes two analysis:

    a. Balance sheet

Prototyping Financial Database Systems with Dremio

b. **Result of operations**

Prototyping Financial Database Systems with Dremio

c. Query time can be checked using OBIEE Administration page:

Prototyping Financial Database Systems with Dremio

  1. When OBIEE performs Annual reports, it uses only aggregate tables, because they contain enough information. The following part of the corresponding sql query for the Result of operations report illustrates this:
1
2
3
4
5
6
7
8
9
10
select sum(T2180.AMOUNT * case  when T791.NAME_LEVEL_1 = 'Expenses' then -1 else 1 end ) as c1,
     T2174.QUARTER as c2,
     T791.NAME_LEVEL_1 as c3
from 
     CALENDAR_MONTH T2174,
     PL_ANALYTIC_ACCOUNT T791,
     TRANSACTION_AGGREGATE T2180
where  ( T2174.MONTH_NAME = T2180.MONTH_NAME and T791.CRD_BAL_ACCOUNT_CODE = T2180.CRD_BAL_ACCOUNT_CODE 
and T791.DEB_BAL_ACCOUNT_CODE = T2180.DEB_BAL_ACCOUNT_CODE and T2174.YEAR = '2016' ) 
group by T791.NAME_LEVEL_1, T2174.QUARTER
  1. But when it is required to query data on more detail levels, like Top 10 Loans (scenario 1) report does, detail balance fact table is used:

Prototyping Financial Database Systems with Dremio

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 select T771.NAME_LEVEL_3 as c2,
     T777.ID as c3,
     T777.NAME as c4,
     sum(T756.AMOUNT) as c5,
     T1014.CALENDAR_DATE as c6
from 
     CLIENT T777,
     BALANCE_ANALYTIC_ACCOUNT T771,
     CALENDAR T1014,
     BALANCE T756
where  ( T756.BALANCE_ACCOUNT_CODE = T771.BALANCE_ACCOUNT_CODE and T756.B_DATE = T1014.CALENDAR_DATE 
and T756.B_DATE = TO_DATE('2016-12-31 00:00:00' , 'YYYY-MM-DD HH24:MI:SS') and T756.CLIENT_ID = T777.ID 
and T771.NAME_LEVEL_3 = 'Loans -> Legal entities' and T1014.CALENDAR_DATE = TO_DATE('2016-12-31 00:00:00' , 'YYYY-MM-DD HH24:MI:SS') 
and (T756.CLIENT_ID in (18097.0, 18215.0, 18219.0, 18269.0, 18279.0, 18290.0, 18295.0, 18310.0, 18368.0, 18409.0, 18452.0)) 
and (T777.ID in (18097.0, 18215.0, 18219.0, 18269.0, 18279.0, 18290.0, 18295.0, 18310.0, 18368.0, 18409.0, 18452.0)) ) 
group by T771.NAME_LEVEL_3, T777.ID, T777.NAME, T1014.CALENDAR_DATE

Query time can be checked using OBIEE Administration page:

Prototyping Financial Database Systems with Dremio

Scenario 2. Using Dremio

Data warehousing techniques and physical datasets

The second scenario assumes that instead of the Oracle database as a data warehouse we are using Dremio. Since OBIEE is used as a reporting system, it is required to provide appropriate Dremio datasets to meet reporting requirements with the corresponding optimization. In general, the same data warehousing techniques are applied to this scenario as to the first one, as well as Business model of the OBIEE repository is almost similar.

Let us have a look on these techniques in the context of data warehouse design using Dremio:

  1. Operational database offloading

Since all analytic queries from the reporting system should be sent to the middle-tier database, first it is necessary to set up a physical datasource for the Oracle operational database

Prototyping Financial Database Systems with Dremio

Now we have physical datasets reflecting operational database, except V_BALANCE and V_TRANSACTION views, which are described later:

Prototyping Financial Database Systems with Dremio

In order to prevent direct database queries, raw reflections should be added to all datasets, except BALANCE and TRANSACTION, which will not be used as sources for the OBIEE. For example, here is the raw reflection for the CLIENT dataset:

Prototyping Financial Database Systems with Dremio

Reflections serve as corresponding physical tables in the Oracle data warehouse of the first scenario.

  1. Star-schema design and fact tables denormalization

As in the previous case, fact tables BALANCE and TRANSACTION should be denormalized. In the first implementation two denormalized tables were created and populated by special queries in fill_dwh.sql. Using Dremio it could be done in two ways - by adding logical datasets with the similar SQL queries, or by creating corresponding views in the source system. In both cases it is required to add raw reflections for the source database offloading. In this case the decision was to add views V_BALANCE and V_TRANACTION to the source system, because it simplifies Dremio datasets structure and therefore query planning procedures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
CREATE OR REPLACE VIEW OPERATIONAL.V_BALANCE  AS
SELECT b.id,
       b.b_date,
       b.amount,
       b.account_id,
       c.client_id,
       a.contract_id,
       a.balance_account_code,
       cln.region_id
  FROM operational.balance b
 INNER JOIN operational.personal_account a
    ON a.id = b.account_id
 INNER JOIN operational.contract c
    ON c.id = a.contract_id
 INNER JOIN operational.client cln
    ON cln.id = c.client_id;

CREATE OR REPLACE VIEW OPERATIONAL.V_TRANSACTION  AS
SELECT t.t_date,
       t.t_time,
       t.amount,
       t.debit_account_id,
       t.credit_account_id,
       CASE WHEN deb_a.contract_id = 1 THEN
          crd_a.contract_id
         ELSE
          deb_a.contract_id
       END AS contract_id,
       CASE WHEN deb_a.contract_id = 1 THEN
          crd_c.client_id
         ELSE
          deb_c.client_id
       END AS client_id,
       CASE WHEN deb_a.contract_id = 1 THEN
          crd_cln.region_id
         ELSE
          deb_cln.region_id
       END AS region_id,
       deb_a.balance_account_code AS deb_bal_account_code,
       crd_a.balance_account_code AS crd_bal_account_code
  FROM TRANSACTION t
 INNER JOIN personal_account deb_a
    ON deb_a.id = t.debit_account_id
 INNER JOIN personal_account crd_a
    ON crd_a.id = t.credit_account_id
 INNER JOIN contract deb_c
    ON deb_c.id = deb_a.contract_id
 INNER JOIN contract crd_c
    ON crd_c.id = crd_a.contract_id
 INNER JOIN client deb_cln
    ON deb_cln.id = deb_c.client_id
 INNER JOIN client crd_cln
    ON crd_cln.id = crd_c.client_id;

    

Raw reflections for both physical datasets are in place:

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

  1. Aggregate fact tables.

Dremio provides an efficient way to store aggregate information, which is called Aggregation reflections. Implementation of this feature will be described later in conjunction with the OBIEE queries tracing.

  1. Data enrichment.

In the first case there are several additional tables, which are needed to produce reports - analytic accounts and calendars. Unlike populating tables with the sql queries from fill_dwh.sql file, we can store them as .csv files somewhere on the disk and create corresponding Dremio physical source. Files can be downloaded from the following location and stored, for example, into the //usr/dremio path:

http://storage.googleapis.com/oracle-dremio-obiee/dremio/balance_analytic_account.csv** http://storage.googleapis.com/oracle-dremio-obiee/dremio/pl_analytic_account.csv http://storage.googleapis.com/oracle-dremio-obiee/dremio/calendar.csv Then the new physical datasource should be created:

Prototyping Financial Database Systems with Dremio

All .csv files have the same Format:

Prototyping Financial Database Systems with Dremio

After formatting .csv datasets, they should look like this: Prototyping Financial Database Systems with Dremio

  1. ETL procedure

When we are using Dremio, it is not required to define such type of procedures explicitly. Everything we need, is to configure physical and logical datasets and optionally set up Refresh Policy for the physical dataset or Reflection Queue Control for the system.

Logical datasets

The next step is creating a logical datasets layer. The first reason is that it is required to add some transformations to the physical datasets, and the second reason - it provides a plain structure of datasets, similar to the Oracle DWH, which simplifies metadata importing into the OBIEE repository.

This layer is represented on the following screenshot:

Prototyping Financial Database Systems with Dremio

All physical datasets are saved as logical ones in the Operational folder of the Blog space. Datasets, which have field transformations, are marked with red color.

  1. Balance dataset as B_DATE field, which type is transformed into DATE, what can be made using Dremio UI while saving it as a logical dataset.

Prototyping Financial Database Systems with Dremio

  1. Transaction dataset has T_DATE field, which type is transformed into DATE, what can be made using Dremio UI while saving it as a logical dataset.

Prototyping Financial Database Systems with Dremio

  1. Calendar dataset has manually CALENDAR_DATE field, which type is manually tansformed into DATE

Prototyping Financial Database Systems with Dremio

OBIEE reporting system overview

As it was mentioned before, OBIEE objects for the second scenario are almost identical to those of the first one, except the physical layer of the Repository. In fact Business Model and Presentation layers are just copies with the minimal corrections related to the aggregation tables, because they are not present in the implementation using Dremio. Although Business Model objects can have different data sources, which are allowed to be switched on/off, copies were made in order to have possibility to compare two scenarios at once. Dashboards and analysis (.catalog files) are also copies with the same queries, but from another Presentation layer.

Physical layer

Repository design process starts with the Physical layer. OBIEE is connected to the Dremio utilizing the JDBC driver (refer to the Configure JDBC drivers within Oracle Weblogic). Before importing Dremio metadata into the Repository, it is necessary to load Java Datasources from the Weblogic server. Choose File->Load Java Datasources… and enter public IP address, port, and user authentication information:

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

Than the new physical Database should be created:

Prototyping Financial Database Systems with Dremio

Within the Database we are creating new Connection Pool. Datasource name must be the same as in the OBIEE JDBC Data Sources configuration:

Prototyping Financial Database Systems with Dremio

The next is importing Dremio datasets metadata from the Java data source. Right click on the Dremio connection pool and choose Import Metadata…

Prototyping Financial Database Systems with Dremio

Select Views, then Next:

Prototyping Financial Database Systems with Dremio

Select Blog.Operational folder and choose Import Selected:

Prototyping Financial Database Systems with Dremio

After finishing import process, click Finish. Physical catalog DREMIO and physical schema Blog.Operational will be created:

Prototyping Financial Database Systems with Dremio

Alias Crd_Blog.operational.personal_account is created manually, since we have only one physical table Blog.operational.personal_account for both credit and debit transaction accounts.

After importing datasets metadata, it is required to add manually foreign keys. They should be configured similar to the DWH Star-schema:

Balance star-schema

Prototyping Financial Database Systems with Dremio

Transaction star-schema

Prototyping Financial Database Systems with Dremio

Business Model and Mapping

After configuring foreign keys, table objects are transferred to the Business Model - Sample Bank Inc.

Prototyping Financial Database Systems with Dremio

Unlike previous case, fact tables have only one data source. Also fields and tables here are usually given more friendly names. Since we have several hierarchical dimensions - Calendar and Analytical accounts, it is necessary to create Logical dimensions for all logical tables, including non-hierarchical:

Prototyping Financial Database Systems with Dremio

After that logical tables are transferred to the new Subject area in Presentation layer, called **Sample Bank Inc. Logical dimensions are transferred automatically.

Prototyping Financial Database Systems with Dremio

Dashboards and analyses

Dashboards for the “Regular annual report” and “Top 10 Loans” can be downloaded from the following location:

http://storage.googleapis.com/oracle-dremio-obiee/OBIEE/Sample Bank Inc. annual report 2016 (Scenario 2).catalog

http://storage.googleapis.com/oracle-dremio-obiee/OBIEE/Top 10 Loans (Scenario 2).catalog

and unarchived into the /Shared Folder/Sample bank/Dashboards path:

Prototyping Financial Database Systems with Dremio

Tracing OBIEE queries and Dremio optimization

Now it is possible to execute reports and trace queries, which are sent from the OBIEE to the Dremio. It can be done using Jobs menu from Dremio UI. Let us start with the Top 10 Loans - open analysis in OBIEE and then select the last job in Jobsmenu:

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

Right from the Overview tab, it is clear, that the query was fully accelerated and direct connections to the Oracle database were not established. Instead raw reflections for the underlying physical datasets were used. This example shows, how ad-hoc analytical queries are processed by the Dremio in our environment. Compared to the same query in Oracle DWH, Dremio takes more time.

But when we speak about regular reports, Dremio gives possibilities to save certain queries constantly and use corresponding reflections when planning them. Open Sample Bank Inc. annual report 2016 (Scenario 2) and choose Jobs menu in Dremio UI:

Prototyping Financial Database Systems with Dremio

There are seven accelerated queries, three for the Balance sheet and four for the Result of operations. Using the example of the selected query, raw reflections are used again:

Prototyping Financial Database Systems with Dremio

Dremio uses several reflections, joins their data and produces the output. But what if we save this query as a logic dataset and add reflection? Moreover, considering the similarity of the queries, we can reduce the number of logical datasets. Let us have a look on these three queries, used to produce Balance sheet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT t1702.quarter AS c2,

       t1688.name_level_1 AS c3,

       SUM(t1635.amount) AS c4,

       t1702.calendar_date AS c5

  FROM blog.operational.balance_analytic_account t1688,

       blog.operational.calendar                 t1702,

       blog.operational.balance                  t1635

 WHERE (t1635.b_date = t1702.calendar_date AND

       t1635.balance_account_code = t1688.balance_account_code AND

       T1702."YEAR" = '2016')

 GROUP BY t1688.name_level_1,

          t1702.calendar_date,

          t1702.quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SELECT t1702.quarter AS c2,

       t1688.name_level_1 AS c3,

       t1688.name_level_2 AS c4,

       SUM(t1635.amount) AS c5,

       t1702.calendar_date AS c6

  FROM blog.operational.balance_analytic_account t1688,

       blog.operational.calendar                 t1702,

       blog.operational.balance                  t1635

 WHERE (t1635.b_date = t1702.calendar_date AND

       t1635.balance_account_code = t1688.balance_account_code AND

       T1702."YEAR" = '2016' AND (t1688.name_level_1 = 'Assets' OR

       t1688.name_level_1 = 'Liabilities'))

 GROUP BY t1688.name_level_1,

          t1688.name_level_2,

          t1702.calendar_date,

          t1702.quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
SELECT t1702.quarter AS c2,

       t1688.name_level_1 AS c3,

       t1688.name_level_2 AS c4,

       t1688.name_level_3 AS c5,

       SUM(t1635.amount) AS c6,

       t1702.calendar_date AS c7

  FROM blog.operational.balance_analytic_account t1688,

       blog.operational.calendar                 t1702,

       blog.operational.balance                  t1635

 WHERE (t1635.b_date = t1702.calendar_date AND

       t1635.balance_account_code = t1688.balance_account_code AND

       T1702."YEAR" = '2016' AND

       (t1688.name_level_2 = 'Current funds' OR

       t1688.name_level_2 = 'Deposits' OR

       t1688.name_level_2 = 'Highly liquid assets' OR

       t1688.name_level_2 = 'Loans' OR t1688.name_level_2 = 'Own funds'))

 GROUP BY t1688.name_level_1,

          t1688.name_level_2,

          t1688.name_level_3,

          t1702.calendar_date,

          t1702.quarter

It is evident, that queries can be considered as different groupings from the following base sql statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
select T1702.QUARTER,

     T1688.NAME_LEVEL_1,

     T1688.NAME_LEVEL_2,

     T1688.NAME_LEVEL_3,

     T1635.AMOUNT,

     T1702.CALENDAR_DATE

from 

     Blog.operational.balance T1635,

     Blog.operational.balance_analytic_account T1688,

     Blog.operational.calendar T1702

where  (  T1635.BALANCE_ACCOUNT_CODE = T1688.BALANCE_ACCOUNT_CODE 

          and T1635.B_DATE = T1702.CALENDAR_DATE 

          and T1702."YEAR" = '2016')

If so, we can create a logical dataset with this statement and add aggregate reflections for all fields, except the Amount, which is the measure.

Here is Blog.Reports.balance_base dataset:

Prototyping Financial Database Systems with Dremio

Aggregate reflections:

Prototyping Financial Database Systems with Dremio

Transaction report can be considered in the same way, there are four sql queries:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
SELECT SUM(t1616.amount * 

           CASE

             WHEN t1694.name_level_1 = 'Expenses' THEN -1 ELSE 1

           END) AS c1,

       t1702.quarter AS c2

  FROM blog.operational.calendar            t1702,

       blog.operational.transaction         t1616,

       blog.operational.pl_analytic_account t1694

 WHERE (t1616.t_date = t1702.calendar_date AND

       t1616.crd_bal_account_code = t1694.crd_bal_account_code AND

       t1616.deb_bal_account_code = t1694.deb_bal_account_code AND

       T1702."YEAR" = '2016')

 GROUP BY t1702.quarter

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT SUM(t1616.amount * 

           CASE

             WHEN t1694.name_level_1 = 'Expenses' THEN -1 ELSE 1

           END) AS c1,

       t1702.quarter AS c2,

       t1694.name_level_1 AS c3

  FROM blog.operational.calendar            t1702,

       blog.operational.transaction         t1616,

       blog.operational.pl_analytic_account t1694

 WHERE (t1616.t_date = t1702.calendar_date AND

       t1616.crd_bal_account_code = t1694.crd_bal_account_code AND

       t1616.deb_bal_account_code = t1694.deb_bal_account_code AND

       T1702."YEAR" = '2016')

 GROUP BY t1694.name_level_1,

          t1702.quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SELECT SUM(t1616.amount * 

           CASE

             WHEN t1694.name_level_1 = 'Expenses' THEN -1 ELSE 1

           END) AS c1,

       t1702.quarter AS c2,

       t1694.name_level_1 AS c3,

       t1694.name_level_2 AS c4

  FROM blog.operational.calendar            t1702,

       blog.operational.transaction         t1616,

       blog.operational.pl_analytic_account t1694

 WHERE (t1616.t_date = t1702.calendar_date AND

       t1616.crd_bal_account_code = t1694.crd_bal_account_code AND

       t1616.deb_bal_account_code = t1694.deb_bal_account_code AND

       T1702."YEAR" = '2016' AND

       (t1694.name_level_1 = 'Expenses' OR t1694.name_level_1 = 'Incomes'))

 GROUP BY t1694.name_level_1,

          t1694.name_level_2,

          t1702.quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
SELECT SUM(t1616.amount * 

           CASE

             WHEN t1694.name_level_1 = 'Expenses' THEN -1 ELSE 1

           END) AS c1,

       t1702.quarter AS c2,

       t1694.name_level_1 AS c3,

       t1694.name_level_2 AS c4,

       t1694.name_level_3 AS c5

  FROM blog.operational.calendar            t1702,

       blog.operational.transaction         t1616,

       blog.operational.pl_analytic_account t1694

 WHERE (t1616.t_date = t1702.calendar_date AND

       t1616.crd_bal_account_code = t1694.crd_bal_account_code AND

       t1616.deb_bal_account_code = t1694.deb_bal_account_code AND

       T1702."YEAR" = '2016' AND (t1694.name_level_2 = 'Interest expenses' OR

       t1694.name_level_2 = 'Interest incomes'))

 GROUP BY t1694.name_level_1,

          t1694.name_level_2,

          t1694.name_level_3,

          t1702.quarter

These queries are different groupings from the following base sql statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SELECT t1616.amount * CASE

         WHEN t1694.name_level_1 = 'Expenses' THEN

          -1

         ELSE

          1

       END AS amount,

       t1702.quarter AS quarter,

       t1694.name_level_1 AS name_level_1,

       t1694.name_level_2 AS name_level_2,

       t1694.name_level_3 AS name_level_3

  FROM blog.operational.calendar            t1702,

       blog.operational.transaction         t1616,

       blog.operational.pl_analytic_account t1694

 WHERE (t1616.t_date = t1702.calendar_date AND

       t1616.crd_bal_account_code = t1694.crd_bal_account_code AND

       t1616.deb_bal_account_code = t1694.deb_bal_account_code AND

       T1702."YEAR" = '2016')

This query is saved as a logical dataset Blog.Reports.result_operations_base with aggregate reflections:

Prototyping Financial Database Systems with Dremio

Aggregate reflections:

Prototyping Financial Database Systems with Dremio

Now when Sample Bank Inc. annual report 2016 (Scenario 2) is opened, new aggregate reflections are used. Here are two groups of queries for the balance and for the transaction part of the Annual report:

Prototyping Financial Database Systems with Dremio

All queries are accelerated using aggregate reflections, as is shown on the following screenshots :

Prototyping Financial Database Systems with Dremio

Prototyping Financial Database Systems with Dremio

It is evident, that queries are executed much faster in comparison with the raw reflections and is comparable with the Oracle DWH case. Aggregate reflections work as invisible cubes - unlike the first scenario, it is not required to have additional aggregation tables.

Conclusion

Summarizing presented results, it would be useful to compile a table, which helps to compare two scenarios. It is important to note, that we speak about analytical applications, which are target systems for the Dremio implementation. In this real-world test case Dremio shows well competitive results:

Parameter Scenario 1. Using Oracle database Scenario 2. Using Dremio
Licence cost $190 per User/1 Year $9500 per Processor/1 Year Free community edition
Installation efforts Requires considerable time and experience Pretty fast and simple
Disk usage ~ 7,7 GB ~ 600MB
Design efforts Requires separate data warehouse model, aggregation tables, ETL procedures, developer tools Datasets and reflections can be configured based upon the sources using Dremio UI
Ad-hoc analysis query time Oracle database shows better performance when processing table joins
Regular report query time Oracle aggregate tables and Dremio reflections have comparable request times