What is Schema-on-Read?

Schema-on-Read is a data processing approach that allows for the ingestion and analysis of data without a predefined schema. Unlike traditional approaches where a schema is defined upfront, Schema-on-Read allows for more flexibility and agility in handling data. The schema is applied at the time of reading or querying the data, allowing for on-the-fly interpretation and analysis.

How Schema-on-Read works

In a Schema-on-Read environment, data is typically stored in a raw or semi-structured format, such as JSON or CSV. When data is ingested into the system, it is stored as-is without any schema enforcement. When querying the data, the schema is applied dynamically based on the structure and metadata of the data. This approach allows for the processing of diverse and evolving data sources without the need for upfront schema design.

Why Schema-on-Read is important

Schema-on-Read provides several benefits to businesses:

  • Flexibility: With Schema-on-Read, businesses can easily handle and integrate diverse data sources with varying structures and formats. There is no need to predefine and modify schemas for each source, enabling quicker onboarding and analysis of new data.
  • Agility: Schema-on-Read allows for iterative and exploratory data analysis. Analysts and data scientists can directly access and explore raw data without waiting for complex ETL processes or schema modifications.
  • Cost-efficiency: Schema-on-Read reduces the need for costly data transformation processes. It allows organizations to store and process data in its raw form, saving storage costs and eliminating the overhead of maintaining multiple data pipelines.

The most important Schema-on-Read use cases

Schema-on-Read is relevant in various use cases:

  • Data Exploration and Discovery: Schema-on-Read enables analysts and data scientists to quickly explore and discover insights from diverse datasets without upfront schema design.
  • Data Integration: Businesses can easily integrate and analyze data from multiple sources, including structured, semi-structured, and unstructured data.
  • Real-time Data Streaming: Schema-on-Read is well-suited for processing and analyzing real-time streaming data, where schema evolution is common.
  • Big Data Analysis: Schema-on-Read simplifies the processing and analysis of large volumes of data by eliminating the need for a predefined schema.

Related Technologies and Terms

Schema-on-Read is closely related to the following technologies and terms:

  • Schema-on-Write: The traditional approach to data processing where the schema is defined and enforced during the data ingestion phase.
  • Data Lake: A storage repository that allows storing and processing large amounts of raw and unstructured data.
  • Data Warehouse: A centralized repository of structured data used for reporting and analysis.
  • ETL (Extract, Transform, Load): The process of extracting data from various sources, transforming it into a desired format, and loading it into a target system.

Why Dremio users would be interested in Schema-on-Read

Dremio, a data lakehouse platform, offers advanced capabilities for Schema-on-Read processing. Dremio users would be interested in Schema-on-Read because:

  • Performance: Dremio's optimization techniques enable high-performance query execution on data lakes with Schema-on-Read, ensuring fast and efficient data analysis.
  • Data Exploration: Dremio's data virtualization layer allows users to explore and query diverse data sources without the need for upfront schema design or data movement.
  • Flexibility: Dremio's schema discovery capabilities facilitate the understanding and interpretation of diverse data sources, enabling agile and flexible analytics.
  • Cost-effectiveness: By leveraging Schema-on-Read, Dremio users can avoid costly ETL processes and maintain a cost-efficient data lake architecture.
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.