8 minute read · February 3, 2020
Accelerate Relational Databases with the Data Lake Engine

THE CURRENT SITUATION
For this blog we are going to run through a sequence of TPC-DS queries in random order. The dataset is set with tables up to 30M rows on Oracle Database 12c. The objective is to run at least 5 concurrent users executing random TPC-DS queries on this dataset and free up space on the Oracle server. Unfortunately, in testing the Oracle database without the assistance of Dremio, the highest concurrency level achievable was only 2 concurrent users in the available infrastructure before Oracle starts to report errors and consume all 4 cores available. In this test, of Oracle we even saw the foreground processes representing over 80% of the total. The thought that more projects could get added to the existing resources is completely unthinkable.
REGISTER ORACLE AS A DREMIO DATASOURCE
Our first step is to register the Oracle datasource. As you will see here registering a new datasource to Dremio is a very quick process. But before we begin, you will first want to gather the following pieces of information
Configuration Item | Description |
Host | The hostname or IP address where your RDBMS server exists |
Port | The port used for communicating with the RDBMS |
Service name | The SID to register into Dremio |
Username & Password | The access credentials granted privilege to the tables to be queries. It is ideal to have this id as a service account |









QUERY ACCELERATION
Dremio without any effort leverages Apache Arrow and other techniques to begin accelerating your queries and thus taking steps to manually accelerate can be unnecessary. But since, our objective here is to remove load from the Oracle Database server, a RAW reflection is recommended. A RAW reflection is effectively like an Index and a Materialized view combined and on steroids. Within the RAW Reflection contains some or all of the data necessary to support queries without burdening the source system and as was mentioned at the start of the blog, a set of tools to simplify without a data pipeline, the data refresh process. Creating a RAW Reflection is quite simple. As an “Admin” user (our “da” user does not have this permission), select the “settings” option as shown below.


CONCLUSION
By following all the steps in this first blog in the RDBMS offload series, we have integrated Dremio with your RDBMS, established a security layer to control and monitor access to datasets, and transferred to the RDBMS to our cloud storage with automatic refresh steps. All this has been done without having to create any special data pipelines or additional administrative overhead. In the next blog of this series, we will explore the performance benefits resulting from following these steps. But as a teaser to the next blog, before Dremio, to run a sampling of TPC-DS queries with the 1 GB dataset scale and 2 concurrent users on a small Oracle server with 4 cores and 16 GB of RAM, the tests consumed all 4 cores as we see in the diagram below. However, using Dremio as a Data Lake engine offloading server load from our overloaded Oracle server using the steps described above, the server load dropped to negligible levels. So what does all of this mean? Simply put Dremio can extend the life of your existing RDBMS systems and begin easing the migration to your cloud of choice.
Ready to get started?

Dremio Test Drive
Experience Dremio with sample data
The simplest way to try out Dremio.
Dremio Cloud
Open & fully-managed data lakehouse
Best Option if your data is on AWS. Forever Free Usage.

Dremio Software
Software for any environment
Download Dremio’s Community Edition