Setting Up Dremio On AWS With Oracle
Dremio
Intro
In the next series of posts I want to cover several scenarios using Dremio and relational databases with the most popular cloud providers.
The first sample will be devoted to the Amazon Web Services and Oracle database. I will guide you through the process of creating virtual Linux machine and Oracle database instance, as well as Dremio installation and connecting it with the database.
Assumptions
In this tutorial we’ll use the free tier of AWS and Dremio Community Edition, so you should be able to follow free and easily. We assume you are familiar with the basics of Dremio, so if you haven’t already, we suggest you read Getting Oriented to Dremio.
We also suggest you ask questions on the Dremio Community site - we love to help.
Getting Started With Oracle on AWS
First of all, you must have an AWS account. If you don’t have an AWS account, open https://aws.amazon.com/, and then choose Create an AWS Account. Amazon provides one year free tier for the minimal services configuration, which can be used for the first steps with AWS. In this tutorial I assume using only free tier services.
Once you have Amazon account, Oracle instance can be launched. First sign in to the AWS Management Console (a button at the top right corner of the home screen), then open RDS (relational database service) link under the Database header:
In the top right corner of the AWS Management Console, choose the region in which you want to create the DB instance - I am using EU (Frankfurt). Then click Instances* under the RDS Dashboard header. If you are new to RDS, you’ll get an empty DB instances list:
Choose Launch DB Instance to start the Launch DB Instance Wizard, set Free tier eligible only, and select Oracle:
Select Oracle EE and fill the necessary fields:
Setting Up VPC
Choose Next Step. Here you can choose default (VPC) option for the VPC Security Group(s), because later we will adjust default security group to allow access from outside VPC. Also the default VPC code for the VPC will be different for your account.
Choose Launch DB Instance and then View DB Instances. You will get your DB instances list like this:
You should wait about 15-20 min while the Status becomes Available. Then you can click the Details icon and get full information about the new DB instance:
The next thing we need is to get an access to our DB instance. First it is necessary to allow external inbound connections, choose the default link under the Security Group header. You will see the Security Groups list, which includes one default security group:
Open Inbound tab and click Edit button, Edit Inbound rules dialog will be opened. For this simple scenario we can add one rule for the Oracle client TCP connection from your public IP, and later one inbound rule for the EC2 instance. After choosing My IP as a Source, your public IP address will be automatically determined.
Setting Up Oracle SQL Developer
After saving the inbound rule we need to set up some client software, which give us a possibility to work with the Oracle database instance. I am using Oracle SQL Developer, it is a is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments.
Before setting up new connection in SQL Developer, open DB instance details and copy the Endpoint address
Open SQL Developer and choose New connection:
New / Select Database Connection dialog will be opened. Fill required fields and paste Endpoint address into the Hostname.
Press Test to check the connection and then click Connect, and a new connection will be saved and opened.
Just for the learning purposes let us create simple table and insert a data. Open new SQL Worksheet:
Creating Test Table
The next step is to run two simple sql scripts - create TEST table in the TEST schema (which was created when we launched a DB instance) and insert a data:
1
2
3
4
5
6
7
create table test
(TEST varchar(2000));
insert into test values ('Hello world!');
commit;
Setting Up Dremio
Now we have running Oracle DB instance, and the next task is to launch Linux instance, install Dremio and then create dataset, which reflects Oracle TEST table. Choose EC2 from the dropdown Services menu:
EC2 Dashboard will be opened:
Press Launch Instance to open instance launching wizard:
Check Free tier only, and select Linux image. I am using Ubuntu Server 16.04:
Free tier eligible configuration is checked by default, so choose Review and Launch:
Configuring Your Key Pair
Linux machine will be run in the same VPC (virtual private cluster), but with the new security group. By default only SSH inbound connection is allowed, later we will add new inbound rule for the Dremio. Choose Launch, and If you don’t have any Amazon key pair, you can create a new one in the opened dialog window:
Download dremio-oracle.pem file, click Launch Instance and then View Instances at the right down corner:
To test connection to the instance, choose Connect, help dialog will be opened:
Setting Up Java
Now you can copy connection string, open terminal window, if you are using Linux on your client computer, and then establish ssh connection:
So we have running Oracle database and EC2 Ubuntu instances on Amazon cloud. Since we want to install Dremio on the EC2 instance, Java installation is also required. It could be done using apt-get utility - connect to the EC2 instance using ssh and run next commands:
1
2
3
4
5
6
7
8
9
10
11
# Update packages:
$ sudo apt-get update
# Install Java:
$ sudo apt-get install default-jre
# After Java installation you can check the result:
$ java -version
Installing Dremio
After that we are creating dremio folder for the better file structuring, and download Dremio archive or deploy:
1
2
3
4
5
6
7
8
9
# Create directory for Dremio.
$ mkdir dremio
$ cd dremio
# Download from www.dremio.com/deploy. Be sure to get the latest version.
$ wget http://download.dremio.com/community-server/1.1.0-201708121825170680-436784e/dremio-community-1.1.0-201708121825170680-436784e.tar.gz
Now we can extract Dremio files, rename Dremio folder (just for the commands simplifying), and run the service:
1
2
3
4
5
6
7
8
9
10
11
# Untar the file you downloaded. Be sure to download the latest version of Dremio.
$ tar -xvzf dremio-community-1.1.0-201708121825170680-436784e.tar.gz
# Rename the process
$ mv dremio-community-1.1.0-201708121825170680-436784e dremio-community-1.1.0
# Start Dremio
$ dremio-community-1.1.0/bin/dremio start
Configuring Network Access
As I mentioned above, new inbound rule for the EC2 instance is required to allow external access to the Dremio port 9047. Go to the EC2 Dashboard and choose Security Groups:
Click corresponding security group launch-wizard-1, choose Inbound tab and click Edit to open inbound rules editor. Add new rule:
After selecting My IP option, your public IP address will be automatically determined. After saving rules, Dremio server can be reached on the EC2 public DNS name and port 9047.
Also we need to add inbound rule for the RDS instance, to allow connection from the EC2 (i.e. from the Dremio). First copy launch-wizard-1 group id to the clipboard. Click corresponding security group default, choose Inbound tab and click Edit to open inbound rules editor. Add new rule:
Setting Up Dremio
Now we can open Dremio using EC2 public DNS name:
After filling all necessary fields on the first Dremio page, we can see the standard Dremio home page for a new installation:
We are now ready to proceed with the Oracle source and sample dataset!
Setting Up The Oracle Data Source
Before adding Oracle source, copy RDS Endpoint to the clipboard as we did for the SQL developer connection. Choose Add Source and select Oracle, enter connection information:
After saving source definition, we can see available Oracle schemas:
Open TEST schema and click on the TEST physical dataset. That is what we need - get data from the Oracle table using Dremio query:
Save new virtual dataset into your own space:
In this tutorial has been built typical scenario, how to use Dremio and relational database within Amazon cloud. Scenario includes the following steps:
- Launching Oracle database service
- Launching EC2 virtual Linux machine
- Running Dremio within EC2 machine
- Establish connection between Dremio and Oracle