Dremio Jekyll


Analyze Historical Data Using Temporal Tables

Feb 12, 2020
Brock Griffey

Overview


Often times we see temporal tables used in environments that require history on every record. Traditionally one would query the temporal table as is to get the historical view of the data set which works well in Dremio when you need to see all records. But what if you need to see only the current version or a specific point in time? In this post, I’ll describe how you can use Dremio with temporal tables to read historical data and current data. As a note, I will be using Teradata in this post but other sources that have temporal such as SQL Server can use the same concepts.


Requirements


  • Dremio Cluster (Enterprise version for Teradata support)
  • Source system with temporal tables setup, i.e Teradata

Setup a temporal table in your source system


For this scenario I’ll create the temporal table on my Teradata data source

1
2
3
4
5
6
7
8
   CREATE MULTISET TABLE dremio.stock_prices (
      ticker VARCHAR(20) ,
      price DECIMAL(18,2) ,
      price_start DATE NOT NULL,
 	  price_end DATE NOT NULL,
      PERIOD FOR price_dur(price_start,price_end) AS VALIDTIME
      )
   PRIMARY INDEX (ticker);

Then, I’ll Insert several records. For simplicity of the content I’ll insert a finite number of records, feel free to add as many as you want to.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','297.74','1998-02-01','1998-03-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','34.99','1956-02-01','1956-03-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','41.76','1969-07-01','1969-08-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','1124.9','2015-09-01','2015-10-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','262.02','2001-02-01','2001-03-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','391.52','1979-11-01','1979-12-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','378.16','1990-12-01','1991-01-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','383.2','1995-10-01','1995-10-31')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','626.36','1980-08-01','1980-09-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','183.76','1978-06-01','1978-07-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','37.88','1968-04-01','1968-05-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','181.48','1974-11-01','1974-12-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','35.18','1966-08-01','1966-09-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','710.64','2007-09-01','2007-10-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','35.16','1966-10-01','1966-11-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','54.5','1972-05-01','1972-06-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','496.62','1981-04-01','1981-05-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','449.28','1987-06-01','1987-07-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','676.77','2006-05-01','2006-06-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','35.82','1960-11-01','1960-12-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','35.1','1965-05-01','1965-06-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','35.1','1958-06-01','1958-07-01')
INSERT INTO dremio.stock_prices
(ticker, price, price_start, price_end)
VALUES('GOLD','346.1','1986-03-01','1986-04-01')

Run a test query on Dremio against your data source


I’ll go ahead and create a data source connection to my Teradata instance and run the following query - This query should display all records in our source. If this is your first time with Dremio, take a look at our getting oriented to Dremio tutorial to learn more about connecting to your data.

1
SELECT * FROM teradata.dremio.stock

image alt text


Setup a view


Now lets create a view using the following query for Teradata - this query will display only the current records:

1
2
create view dremio.current_stock_prices as (
current validtime select * from dremio.stock_prices);

Run a test query on Dremio


Now lets run a query to get all the current records

1
SELECT * FROM teradata.dremio.current_stock_prices

image alt text


Visualize it!


Since we now have the ability to view our history from our source system, we can visualize changes between periods on the same dataset.

Using a visualization tool, such as Tableau, create a chart selecting the base table to view the historical dataset we created. Make sure to create a filter that pulls only the date range you want to see.

image alt text


Conclusion


By utilizing views created in the source system Dremio can now query the specific date ranges of data. This is very helpful in many situations where we need to compare the same dataset against itself for prior periods. The same concepts can also apply when working with data sets in the data lake that contain a single view of every change. In my next post I will describe how to view a data set in the data lake and create two Virtual Data Sets that allow you to see current records only and historical records.

Ready to get started?