Accelerating Queries with Dremio’s DynamoDB ARP Connector
When we announced Dremio 4.0 a few months ago, we told you about Dremio Hub and how this would give our community a way to leverage community-supported connectors for a broad variety of data sources, plus the ability to create your own. In an effort to continue helping our great community of users, I’m excited to share with you today Dremio’s DynamoDB ARP Connector.
What is DynamoDB?
From AWS’s documentation, “Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability”. Like any other Amazon Web Services offering, DynamoDB is designed to satisfy the most stringent demands in the big data world; it provides its users with highly predictable performance, massive scalability, support for a wide range of data types and more while saving users from the complexity of a distributed system.
The DynamoDB Connector and ARP framework
Dremio’s Advanced Relational Pushdown (ARP) framework allows data consumers and developers to create custom relational connectors for those cases where data is stored in uncommon locations. Using the ARP framework not only allows you to create better connectors with improved push-down abilities but it also provides a way to develop connectors more efficiently and easily.
The ARP framework lets you build a connector very easily for any data source with a JDBC driver. Even the push downs are simply mappings on YAML which makes it very easy for anyone to create a connector.
A few months ago, I had the great opportunity to attend AWS:ReInvent where a ton of Dremio fans approached me asking if Dremio could connect to DynamoDB, that was a lightbulb moment for me, so I started working on building a connector using the ARP framework. Here are the features of this connector:
- Basic data type support
- Basic push downs
- Basic SQL functions support
- Join data in your data lake (S3, Azure Storage, Hadoop and other relational sources) with DynamoDB.
- Provide lightning fast query speed directly on your data lake
- Accelerate DynamoDb data with Data Reflections™ to provide interactive sub second SQL performance.
Please visit my GitHub page to download the connector here. When you are ready to give it a try follow these instructions to install it on your Dremio deployment.
Installing The Connector
First, make sure Dremio is not running, then, click here to download the release .jar from and place it inside the /$DREMIO_HOME/jars directory. Additionally, move the JDBC driver and Simba .lic file to the /$DREMIO_HOME/jars/3rdparty directory. Start and log in in Dremio.
Inside Dremio, click on the plus sign to add a new Data source
Select DynamoDB from the list of available sources.
Add the connection and authentication parameters and click save.
If everything went well, you should be able to see the directories inside your data source.
Things to watch out for
AWS does not offer a native JDBC driver for DynamoDB so I used the recommended driver on their downloads page. Unfortunately the driver (provided by Simba) is an enterprise driver that is not open source and does require a license, however, you can request a trial.
Due to the way the Simba Driver inferes the schema from a table, the table cannot be empty. If it is empty, Dremio will not discover the table.
And that is a wrap, I hope you learned something useful today. To learn more about Dremio visit our tutorials and resources, also if you would like to experiment with Dremio on your own virtual lab, go ahead and checkout Dremio University, and if you have any questions visit our community forums where we are all eager to help.
For more information about how you can create your custom connector, checkout the Dremio Hub page.