Users can interact with Dremio through a comprehensive set of REST APIs, allowing DevOps teams to orchestrate Dremio with other components of their technology stacks, and end users to more easily build web applications directly on top of Dremio. Dremio allows users to perform most operations via its REST API, including issuing queries and retrieving results as JSON; browsing and managing the data catalog; managing reflections; manually refreshing a reflection; and accessing the status and results for a specific job. You can find the documentation for the API here, and follow along to learn how to perform some basic requests in Python.
Prerequisites
To follow this tutorial you should have access to a Dremio deployment, and you should have completed the first two tutorials - Getting Oriented to Dremio, and Working With Your First Dataset. You will need an account with a username and password to use the API, and familiarity with some basic concepts in Dremio. You should also have Python installed and configured on your operating system. Go to the Python website to do so if you haven’t already.
Setting Up Your Requests
To use the API, we’ll first want to import Python’s requests library to make HTTP requests and the json library so that API responses are returned in JSON format. Then, we’ll want to define some constants, including your Dremio username and password, headers for authentication, and your Dremio server. If you are running Dremio on your local machine, this will be localhost.
The Dremio API is designed around RESTful principles, so next we will define some wrapper functions for HTTP GET, POST, PUT, and DELETE.
def apiGet(endpoint):
return json.loads(requests.get('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers).text)
def apiPost(endpoint, body=None):
text = requests.post('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text
# a post may return no data
if (text):
return json.loads(text)
else:
return None
def apiPut(endpoint, body=None):
return requests.put('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text
def apiDelete(endpoint):
return requests.delete('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers)
Generally, POST and PUT requests will take parameters. For example, creating a source will take a source input. Input configurations are detailed within the Models subheading of each endpoint. Make sure to format your configuration properly to send requests.
Authenticating Users
Now that we’ve defined our request functions, we can start using the API. Dremio uses a token-based authentication system, so we first need to authenticate ourselves by generating a token. We can do this by using the login endpoint along with your username and password as the body. Note that we are currently using an older API for logging in.
def login(username, password):
# we login using the old api for now
loginData = {'userName': username, 'password': password}
response = requests.post('http://demo.drem.io:9047/apiv2/login', headers=headers, data=json.dumps(loginData))
data = json.loads(response.text)
# retrieve the login token
token = data['token']
return {'content-type':'application/json', 'authorization':'_dremio{authToken}'.format(authToken=token)}
headers = login(username, password)
The login function will return the header that we must pass in to all of our other API requests.
Querying Data Using the API
In previous tutorials you learned how to upload data into Dremio either from your local machine or one of our connectors (Amazon S3, Microsoft ADLS, among many others). Let’s use the SQL and Job endpoints to query this data and return rows. Given a path to your dataset, you can use this path to access this dataset through the API. Define the path as an array, then form your SQL query to send to the API. You can find the path in the UI as well. Today I’m using a dataset of all WTA tennis matches, and my SQL query returns all of Serena Williams’s matches.
def querySQL(query):
queryResponse = apiPost('sql', body={'sql': query})
jobid = queryResponse['id']
return jobid
path = ['\"@elbert\"', 'wta', 'matches']
path = '.'.join([str(x) for x in path])
query = "SELECT * FROM {source} WHERE winner_name = 'Serena Williams' or loser_name = 'Serena Williams'".format(source=path)
jobid = querySQL(query)
The SQL endpoint returns a job ID that you can also access in the interface. If your query was successful, there will be a green checkmark along with associated metadata.
Now that we have the job ID, we can use the Job endpoint to access those rows. The Job API uses a paging model, so you can recursively page through your rows by setting an offset and limit per call.
The Job API model returns a rowCount, the returned rows, and the schema of the table.
Exploring the Catalog
Dremio’s data catalog is a representation of all of your datasets, spaces, and sources that you have either created or have access to.
The Catalog endpoint gives you access to this.
The following request will return a top level view of the data catalog.
apiGet('catalog')
Each container returned in the response has an associated ID. Given a path to a dataset, you can recursively traverse IDs returned by the Catalog endpoint to get metadata about that dataset. Here’s how:
def getCatalogRoot():
return apiGet('catalog')['data']
def getByPathChildren(path, children, depth):
# search children for the item we are looking for
for item in children:
if item['path'][depth] == path[0]:
path.pop(0)
response = apiGet('catalog/{id}'.format(id=quote(item['id'])))
if len(path) == 0:
return response
else:
return getByPathChildren(path, response['children'], depth + 1)
def getByPath(path):
# get the root catalog
root = getCatalogRoot()
for item in root:
if item['path'][0] == path[0]:
path.pop(0)
if len(path) == 0:
return item
else:
response = apiGet('catalog/{id}'.format(id=quote(item['id'])))
return getByPathChildren(path, response['children'], 1)
dataset = getByPath(['@elbert', 'wta', 'matches'])
Using the id returned by getByPath, you can also refresh your reflections for a particular dataset given its id
Sources represent all the different connectors for Dremio including data from a local machine. To access all of your sources, run:
apiGet('source')
To access a specific source, run:
apiGet('source/{<source-id>}')
To create a new source, use a POST request with the correct source type for your specific connector. Here I am connecting to our Elastic cluster with the correct formatting.
We hope that this is a gentle but informative introduction about how to use the REST API to interface with Dremio. To learn more about REST APIs and Dremio checkout our Dremio API documentation.
Ready to Get Started? Here Are Some Resources to Help
Webinars
Cyber Lakehouse for the AI Era, ZTA and Beyond
Many agencies today are struggling not only with managing the scale and complexity of cyber data but also with extracting actionable insights from that data. With new data retention regulations, such as M-21-31, compounding this problem further, agencies need a next-generation solution to address these challenges.
Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.