Dremio allows seamless querying of JSON data without rigid schemas, saving time for analysts and data engineers.
It automatically discovers JSON structure, handles mixed types, and safely processes malformed data with TRY_CONVERT_FROM.
Dremio supports standard SQL JSON functions and offers intuitive extraction methods for nested arrays.
The platform adapts to schema drift, enabling queries that reflect the true data shape without rigid contracts.
Dremio optimizes performance, ensuring quick access to semi-structured data for AI, minimizing latency and schema mismatches.
Most companies process terabytes of JSON daily, yet querying it often requires brittle pre-processing pipelines and rigid data contracts. This has analysts and data engineers wasting hours defining explicit schemas just to run a simple aggregation. Dremio eliminates this friction by allowing you to query JSON directly in the lakehouse with complete schema-on-read flexibility.
Without the need to define structure upfront, Dremio naturally discovers your JSON data, handling mixed types and nested arrays automatically. If you manage an Agentic Lakehouse, this flexibility dictates how fast your AI agents and analysts can find answers. Read on to learn how Dremio's JSON SQL functions can empower you to query semi-structured data instantly.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Parsing and Conversion: Schema-on-Read in Action
Converting a raw JSON string into a queryable structure is the first step in any analysis. Traditional engines force you to declare the exact schema before ingestion to guarantee performance.
Dremio transforms this process. As a unified data lakehouse platform, Dremio naturally discovers and reads JSON files. When converting JSON strings stored within columns (like a VARCHAR), Dremio uses its conversion functions to apply structure exactly when you query it:
CONVERT_FROM(binary_value, 'json'): Transforms a JSON string into Dremio's internal STRUCT or LIST types.
With Dremio's conversion functions you no longer need a rigid schema definition upfront. Instead, you just convert the data and start querying.
Validation: Safe Handling of Mixed Types and Errors
Data pipelines break when they encounter unexpected JSON payloads. A single truncated log entry can terminate a massive aggregation over millions of rows. Dremio provides a safety net for malformed data with the TRY_CONVERT_FROM function. If a record contains invalid JSON, the function simply returns NULL for that row while the rest of the query succeeds.
Furthermore, Dremio gracefully handles mixed types. If your JSON data contains inconsistent types across rows for the same field (for example, mixing strings and integers), Dremio does not fail the query. It intelligently reports the column as a mixed type, allowing you to proceed with your analysis and cast the values as needed.
Extraction and Path Access: Querying Nested Arrays
Once the JSON is parsed, extracting specific values or unwinding nested arrays must be intuitive. Dremio offers standard dot notation for structs (foo.votes.cool) and array indexing (categories[0]). Because Dremio acts as a high-performance query engine directly on data lake storage, it natively supports standard SQL JSON functions like JSON_VALUE, JSON_QUERY, and JSON_EXISTS.
When working with JSON arrays, Dremio uses the FLATTEN() function to expand elements into separate rows:
SELECT
id,
FLATTEN(CONVERT_FROM(items_json, 'json')) AS expanded_item
FROM sales;
The FLATTEN function expands each array entry into a separate row. For deeply nested arrays, multiple FLATTEN operations can be chained sequentially, making it incredibly simple to explore complex hierarchies.
Type Inspection and Eliminating Rigid Contracts
JSON schemas change constantly. New fields appear and integer fields suddenly contain strings. In traditional warehouses, this schema drift breaks downstream queries, forcing data engineers to rebuild table definitions and backfill data.
Dremio inherently handles schema drift by reading the data exactly as it exists. Because there are no explicit data contracts to enforce at ingestion, your queries adapt to the data's true shape. By using the TYPEOF() function you can inspect the exact data type of any extracted element on the fly. With this you can query data in place without the risk and costs of data movement or rigid pipeline reconstruction.
Performance and the Agentic Lakehouse
One downside of schema-on-read is that querying deeply nested unstructured text without prior manifesting can result in slow initial scans. Especially when compared to strictly-typed Parquet columns. However, performance in Dremio is autonomously optimised by the architecture. Dremio's Columnar Cloud Cache (C3) and Autonomous Reflections mitigate this by caching and optimising repeated queries automatically.
This automated performance is critical for AI. For an AI Agent to provide accurate answers, it needs immediate access to deep business context. When an LLM generates a SQL query against semi-structured data, any latency or schema mismatch can cause timeouts and hallucinations.
With Apache Arrow eliminating the serialisation tax, Dremio ensures that even complex FLATTEN and CONVERT_FROM operations execute at interactive speeds. When your AI Agent or custom MCP Server client repeatedly queries complex JSON logs, the results return in sub-seconds. And the semantic layer teaches the AI your business language so it generates the right SQL, with the underlying engine executing those JSON functions fast enough to deliver a seamless, conversational analytics experience.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.