
Introduction
Data types in Dremio fall into two categories: primitive types such as INT and VARCHAR that hold single values, and semi-structured types like LIST, STRUCT, and MAP that hold complex values.
Arrays are lists of arbitrary size of any single type, indexed by non-negative integers, and are useful for holding sparse data.
Note: LIST and ARRAY are synonyms in Dremio, so you will see references to both in documentation, error messages, and function names.
Dremio Cloud as of mid-September and Dremio Software version 24.2 provide a variety of functions for manipulating array data.
The remainder of this article describes array functions in Dremio.
Dremio Array Functions
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Creating Arrays
In Dremio Software v24.2+ and in Dremio Cloud, you create array literals with the ARRAY keyword like this:
SELECT ARRAY['apple', 'strawberry', 'banana'] -- ["apple","strawberry","banana"]
Dremio Software versions before v24.2 do not have array literals, but you can get the same result using CONVERT_FROM and JSON strings. For example:
SELECT CONVERT_FROM('["apple", "strawberry", "banana"]', 'json') -- ["apple","strawberry","banana"]
You can also query array columns from tables and views in the usual manner. The Python 3 script below creates a file example.parquet
that has array columns named fruits
and numbers
:
#!/usr/bin/env python3 import pyarrow as pa import pyarrow.parquet as pq table = pa.table( { "fruits": [("apple", "strawberry", "banana")], "numbers": [(1, 2, 3,)], } ) pq.write_table(table, "example.parquet")
After uploading example.parquet
you can query fruits
and numbers
through Dremio as you would any columns:
SELECT fruits FROM example; -- ["apple","strawberry","banana"] SELECT numbers FROM example; -- [1, 2, 3]
Available Functions
The functions below are available in Dremio Software v24.2 and in Dremio Cloud:
Signature | Description |
array_avg(A) | Returns the average of all non-null elements in A. |
array_contains(A, V) | Returns whether A contains V. |
array_max(A) | Returns the maximum value in A. |
array_min(A) | Returns the minimum value in A. |
array_remove(A, V) | Removes all elements that equal V in A. |
array_sum(A) | Returns the sum of all non-null elements in A. |
cardinality(A) | Returns the number of elements in A. |
unnest(A) | Converts elements in A into rows. |
Examples
SELECT ARRAY_AVG(numbers) FROM example; -- 2.000000 SELECT ARRAY_CONTAINS(fruits, 'banana') FROM example; -- true SELECT ARRAY_MAX(numbers) FROM example; -- 3 SELECT ARRAY_MIN(numbers) FROM example; -- 1 SELECT ARRAY_REMOVE(fruits, 'strawberry') FROM example; -- ["apple","banana"] SELECT ARRAY_SUM(numbers) FROM example; -- 6 SELECT CARDINALITY(fruits) FROM example; -- 3 SELECT fruit FROM example, UNNEST(fruits) as t(fruit); -- apple -- strawberry -- banana
Coming Soon
The functions below are planned for Dremio Software v24.3 and the November 2023 update of Dremio Cloud:
Signature | Description |
array_agg(expr) | Returns an array consisting of all values in expr. |
array_append(A, E) | Returns a new array with E at the end of A. |
array_distinct(A) | Returns a new array with only the distinct elements from A. |
array_frequency(A) | Returns a map where the keys are the unique elements in A, and the values are how many times the key appears. |
array_prepend(A, E) | Returns a new array with E at the beginning of A. |
arrays_overlap(X, Y) | Returns whether X and Y have any elements in common. |
set_union(X, Y, ...) | Returns an array of all the distinct values contained in each array of the input. |
Get Started with Dremio Cloud – It’s Free!
Dremio Cloud: The easy and open, fully managed data lakehouse platform.
Everything you need to build, automate, and query your data lakehouse in production.