Tony TruongSenior Product Marketing Manager, Dremio
The Semantic Layer
The semantic layer is a business representation of corporate data for end users.
In most data architectures, the semantic layer sits between your data store (like data warehouse and data lake) and consumption tools for your end users. By representing data in a business-friendly format, data analysts can create meaningful dashboards and derive actionable insights from data without needing to understand the underlying physical data structure.
Let’s quickly cover what a semantic layer is not:
a replacement for a data lakehouse
an alternative for a data transformation or BI tool
an OLAP cube or aggregation layer
Why Use A Semantic Layer?
Companies use data warehouses or data lakes to store data from multiple sources. End users need a way access to this data in a way that is meaningful to them.
The problem is, the data there only makes sense to data engineers.
Data engineers create ETL pipelines from source datasets into data lakes and data warehouses. They physically organize the data into schemas and tables. The table names are complex and reflect the physical data model.
This is where a semantic layer is needed.
As the logical layer for data access, the semantic layer provides a way for teams to collaborate and share data products. It gives data consistency and simplicity across different domains. The semantic layer standardizes business logic and makes data more useful to everyone. A well-architected semantic layer empowers end users to become decision-makers with self-service analytics.
Common Ways to Implement a Semantic Layer
Now that we’ve set a baseline for what a semantic layer is, we’ll review common ways organizations implement a semantic layer.
Data warehouses often aggregate data from many sources - and some may be irrelevant to business users.
To avoid redundancy and to give data analysts access to just the datasets they need, data engineers will create data marts - curated subsets of the data warehouse that provide a domain-specific view of data for various departments. When creating data marts, data engineers will often represent this data in business-friendly language for end users.
Data marts are one way to implement a semantic layer, but it does come with its set of challenges.
Challenges with Data Marts
A limitation of data marts is their dependency on the data warehouse. Slow and bombarded data warehouses are often the reason for creating data marts. The size of a data warehouse is typically larger than 100 GB and often more than a terabyte. Data marts are designed to be less than 100 GB for optimal query performance.
If a line of business requires frequent refreshes on large data marts, then that introduces another layer of complexity. Data engineers will need more ETL pipelines to create processes ensuring the data marts are performant.
Now that your data mart is less than 100 GB, what happens if end users request data outside the context of the data warehouse?
Many organizations have data sources that must stay on-premises. Others may store data in another proprietary data warehouse, sometimes across different cloud providers. This makes it hard for end users to do ad-hoc analysis outside the context of their data warehouse. Business units create their own data marts, resulting in data sprawls across the enterprise - a data governance nightmare.
In addition to planned queries and data maintenance activities, data warehouses also support ad hoc queries and online analytical processing (OLAP). An OLAP cube is a multidimensional database for analytical workloads. It performs analysis of business data, providing aggregation capabilities and data modeling for efficient reporting.
Challenges with OLAP Cubes
OLAP cubes for self-service analytics can be unpredictable because the nature of business queries is not known in advance. Organizations cannot afford to have analysts running queries that interfere with business-critical reporting and data maintenance activities. Because of this, datasets required to support OLAP workloads are extracted from the data warehouse, and analysts run queries against these data extracts.
OLAP cube’s dependency on the data warehouse poses many challenges.
As extracted datasets from the data warehouse, OLAP cubes require an understanding of the underlying logical data model. In many cases, massive amounts of data are ingested into memory for analytical queries, incurring expensive computing bills. Because the data extracts are a snapshot in time of the data warehouse, it offers limited interaction with the data until the OLAP cubes are refreshed. Depending on the workload, it’s not uncommon for cubes to take hours for data refresh.
Semantic Layer Challenges with Data Marts and OLAP Cubes
What is the solution?
Most organizations prefer to have a single source of enterprise data rather than replicating data across data marts, OLAP cubes, or BI extracts. Data lakehouses solve some of the problems with a monolithic data warehouse, but it’s only part of the equation. A unified semantic layer is just as important.
A unified semantic layer is mandatory for any data management solution such as the data lakehouse. Some benefits include
A universal abstraction layer. Technical fields from facts and dimensions tables are transposed into business-friendly terms like Last Purchase or Sales.
Prioritizing data governance. A unified semantic layer makes it easy for teams to share views of datasets in a consistent and accurate manner, meaning only users with provisioned access can see the data.
All your data. Your end users need self-service access to new data. You don’t want to spend more time creating ETL pipelines with dependencies on proprietary systems. Consume data where it lives.
We covered two of the most common ways organizations have implemented a semantic layer with data marts and OLAP cubes.
It’s clear a semantic layer can help an organization integrate data for end users. Bringing self-service analytics to your organization should be easy.
Ready to enable self-service analytics across all your data? Check out how organizations are using Dremio today for their open data lakehouse experience.
Additional resources you may find helpful
The Path to Self-Service Analytics on the Data Lake
Download this white paper to get a step-by-step roadmap of Dremio adoption. At each step, you’ll learn about benefits gained, as well as the complexities and risks reduced, as workloads are migrated from traditional systems to Dremio.
Data engineers play a crucial role in designing, operating, and supporting the increasingly complex environments that power modern data analytics. What are their most important challenges and how can they solve them strategically?