14 minute read · September 29, 2025

Handling Complex Data Types in Dremio

Michael Flower · System Engineer, Dremio

Copied to clipboard

Handling Complex Data Types in Dremio

Key Takeaways

Overview

Preparation of Example Dataset

Handling Arrays and Structs in Dremio

Using a Semantic layer

Key Takeaways

Dremio supports complex data types like JSON and parquet with features for handling arrays and structs.
To demonstrate, the blog uses a sample dataset, running test queries and showing how to manipulate data effectively.
Users can flatten arrays using the FLATTEN() function or expand them into separate columns for clarity.
The blog outlines expanding structs and provides SQL examples for extracting specific columns.
The final dataset can be saved as a Virtual Data Set, enabling easier access and processing, such as Iceberg table creation.

Overview

Dremio provides out-of-the-box methods of handling complex data types in, for example JSON and parquet datasets. Common characteristics are embedded “columns within columns” and “rows within columns”.

In this blog, we will demonstrate how Dremio can discover and handle these types of data. The examples have been tested on the following Dremio versions:

23.1
25.2.18
26.0.5

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Preparation of Example Dataset

We will use the “Samples” dataset which is available to any Dremio environment.

Add the Sample data source

Click on the + symbol next to Data Lakes

Add the sample source

After the source has been added, click on samples.dremio.com

Click on Dremio University

Scroll down the list of files, and click on restaurant_reviews.parquet

Click “Save” to promote the Restaurant Reviews parquet file as a PDS (Promoted Data Set).

Run a test query against the sample source

We will run the SQL examples from the resultant “SQL Editor”, e.g.

SELECT *
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Handling Arrays and Structs in Dremio

Handling Arrays in Dremio

Run the following query to see an example array:

SELECT  "_id", business_id , categories
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Here, we are retrieving 3 columns. Note that the “categories” column is an array :

(as denoted by the icon)
Let’s use this column for further analysis.

Flatten an Array

Arrays can be flattened using the SQL FLATTEN() function.

Run the following SQL command:

select  "_id", business_id
, categories
, flatten(categories) as category
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

The result shows:

As previously we see the unflattened array (the “categories” column)
Also, each entry of that array has been flattened into a column (“category”); each array entry listed on separate rows, e.g. “_id” value ‘W6KUFNWNYYcE+g60’ has 2 entries in the results, one row per category

Expanding an Array

Alternatively, the array can be expanded along the single row, e.g. run the following SQL command:

SELECT  "_id", business_id
, categories
, categories[0] as cat0
, categories[1] as cat1
, categories[2] as cat2
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

The result shows:

The unflattened array (the “categories” column)
The first 3 array entries have been flattened into other columns on the same row (cat0, cat1, cat2).

Handling Structs in Dremio

Run the following query to see an example struct:

select  "_id", business_id
, votes
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

Note that the “votes” column is a struct (as denoted by the icon)

Expanding a struct

We can extract an individual column by name, e.g. to extract the “cool” column from the struct:

select  "_id", business_id
, foo.votes
, foo.votes.cool 
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The result shows:

The unflattened struct (the “votes” column)
The extracted “cool” column and its value (0).

We can also expand all of the struct entries into separate columns using the asterisk notation, e.g.

SELECT "_id", business_id
, foo.votes
, foo.votes.* 
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The ”votes” struct is expanded into separate columns ("cool", "useful", “funny”).

Full example of flattened and expanded dataset:

To fully flatten and expand our sample dataset, run this query:

SELECT  "_id", business_id
, "open", full_address
, flatten(categories) as category
, foo.attributes.parking.*
, flatten(neighborhoods) as neighbourhood
, foo.hours.tuesday."open" as tuesdayopen
, foo.hours.tuesday."close" as tuesdayclosed
, foo.hours.wednesday."open" as wednesdayopen
, foo.hours.wednesday."close" as wednesdayclosed
, foo.hours.thursday."open" as thursdayopen
, foo.hours.thursday."close" as thursdayclosed
, foo.hours.friday."open" as fridayopen
, foo.hours.friday."close" as fridayclosed
, foo.hours.saturday."open" as saturdayopen
, foo.hours.saturday."close" as saturdayclosed
, foo.hours.sunday."open" as sundayopen
, foo.hours.sunday."close" as sundayclosed
, foo.hours.monday."open" as mondayopen
, foo.hours.monday."close" as mondayclosed
, foo.votes.*
, stars0
, foo.votes0.*
, foo.compliments.*
, flatten(elite) as elite
, average_stars, type1
, review_count0, name0, user_id0, fans
, type0, business_id0, "_index", "_type", "_uid", "_id0", yelping_since
, type, "date", review_id, user_id
, longitude, state, stars, latitude
, text
, city, review_count, name
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

Nb; in the original parquet dataset there are:

100 rows
38 columns

However in the resulting dataset:

371 rows (flattened)
69 columns (expanded)

Using a Semantic layer

To present the flattened, expanded dataset in a digestible form, save the query as a Virtual Data Set:

Click on Save View As …

Provide a name and a storage location for the new VDS, e.g.

Name: reviews
Location: working

Click “Save”

We can now access our ready-formatted dataset using the VDS, e.g.

SELECT * FROM working.reviews;

Next steps:

This can be saved as a raw reflection, so that the data is already pre-processed.
Or it can be saved as an Iceberg table, e.g.

create table s3b.iceberg.reviews as SELECT * FROM working.reviews;

Article Topics

Dremio Blog: Various Insights

Handling Complex Data Types in Dremio

Table of Contents

Key Takeaways

Overview

Try Dremio’s Interactive Demo

Preparation of Example Dataset

Add the Sample data source

Run a test query against the sample source

Handling Arrays and Structs in Dremio

Handling Arrays in Dremio

Flatten an Array

Expanding an Array

Handling Structs in Dremio

Expanding a struct

Full example of flattened and expanded dataset:

Using a Semantic layer

Make data engineers and analysts 10x more productive

Table of Contents

Key Takeaways

Overview

Try Dremio’s Interactive Demo

Preparation of Example Dataset

Add the Sample data source

Run a test query against the sample source

Handling Arrays and Structs in Dremio

Handling Arrays in Dremio

Flatten an Array

Expanding an Array

Handling Structs in Dremio

Expanding a struct

Full example of flattened and expanded dataset:

Using a Semantic layer

Additional Resources

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

Table-Driven Access Policies Using Subqueries

Make data engineers and analysts 10x more productive