Data mesh is a decentralized approach to data management that focuses on domain-driven design (DDD). It aims to bring data closer to business units or domains, where people are responsible for generating, governing, and treating the data as a product.
What Makes Data Mesh Different From Other Data Architectures?
Traditional data architectures often create a gap between the data producers and consumers, which leads to the original meaning of data being lost. It is, however, imperative to have the domain context in the data for effective decision-making. But, more importantly, we don't treat data as first-class citizens in the current approach. Hence, stakeholders have no actual ownership of the data, ultimately impacting the infrastructure team and consumers.
For instance, with centralized data architecture, organizations will use a data warehouse or data lake to centrally store sales, marketing, and HR data. Then, data engineers in IT have to make the data available to various departments and data consumers through dataset copies made via ETL pipelines. Unfortunately, this traditional structure creates a bottleneck that data consumers must go through to access data which is both difficult and time-consuming for everyone involved.
Data mesh assigns the ownership of analytical data to each domain in contrast to building one monolithic platform with each domain's data managed centrally by IT that serves all the organization's analytical needs using this centralized platform. It aims to solve these centralized data architecture problems from an organizational and technological standpoint by shifting the responsibility to individual domains for their data creation, transformation, and availability.
Data Mesh Principles
Data mesh was introduced by Zhamak Dehghani and is built on four principles: domain ownership, data as a product, self-service data platform, and federated computation governance. The first two principles emphasize an organizational mindset to treat data as a first-class product owned by individual teams. The second two principles focus on the elements of the technical foundation needed to achieve this new approach to data.
1. Domain Ownership:
Decentralization is the core of a data mesh approach. Here, this refers to the decentralization of business units/domains rather than technology or infrastructure. In a data mesh, an individual domain takes full ownership of its data from end to end, ensures that the data is trustworthy (high quality), has a domain-specific context, and is consumable by other domains within the organization.
One of the challenges of a traditional data ecosystem is that there is no real ownership of the data itself. For example, how do you make data self-describing and ensure it is of the highest quality or trustworthy? Also, over time, central data engineering teams become a bottleneck as the need to make data available to consumers increases. In a data mesh, domain teams are responsible for data creation, ingestion, preparation, and making the data available. Federated ownership by domain helps maintain the business context of data (domains know their data very well), and the responsibility to make data available to the consumer shifts away from the central infrastructure team.
2. Data as a product:
The second principle centers around treating data as a product rather than just an asset in an organization. This works in conjunction with distributed domain ownership of data. Now that each domain owns its data and is responsible for producing and catering data to its consumers, it is expected to be high-quality, fresh, and trustworthy. Most importantly, it addresses a critical problem related to the previous approach — enabling data interoperability across domains.
Having an organizational mindset that the data generated by one domain can be used by another is pivotal in treating data as the primary product. Like with any other product, this approach lets you think from the consumers’ point of view and ensures you put quality first and address the customers’ requirements (in this case, data consumers in other domains).
3. Self-service data platform:
Data teams need a platform to build domain-specific data products and serve those data products across business units in a self-sufficient way. However, to allow domain teams (engineers, domain experts/owners) to have a complete focus on developing quality data products, it is essential to abstract the infrastructure to facilitate self-service.
In a data mesh, a centralized infrastructure team provides a common platform with the tools and services needed for computing, storage, and service of data products that work irrespective of domains. Then, each domain can calibrate the infrastructure and tools per their requirements and the data products they build. This allows domains to successfully own data and products and lets the central infrastructure teams focus entirely on improving the platform instead of managing ETL/ELT flows and responding to constant requests to create new datasets.
4. Federated computational governance:
The final data mesh principle aims to support all three principles discussed above by letting each domain exercise governance over the data products they build locally. However, domains must still adhere to standard rules that the organization has decided upon globally. This is important, particularly with a decentralized approach to run the ecosystem in harmony and achieve data interoperability. Ultimately, this model aims to have a strong collaboration between the local domain and the global governance team to cater to all the data needs.
Data Mesh Benefits
Data mesh improves how you manage data and make it available across an organization by focusing on domain decentralization. An efficient data mesh implementation can provide you with some very notable benefits:
- Easier & Faster Access to Data: Data consumers (analysts/scientists) have the data available to them, which reduces the time to insight and allows businesses to make faster decisions.
- Flexibility & Independence: Gives ownership and autonomy of data to teams that know the data best.
- Standardized Data Observability: Explicitly prioritizes treating data as a product which helps to establish a data-driven culture.
- Business Agility & Scalability: Reduces overhead on central data infrastructure teams, who can now focus solely on improving the platform.
- Improved Data Security: Each domain is responsible for defining their own security & governance policies while adhering to the globally defined ones to make data discoverable. This results in improved security for the data products.
Data Mesh Challenges
Implementing a data mesh requires organizational and cultural change, which can take time. Some of the practical challenges you may face with a data mesh implementation can be:
- Cross-Domain Analytics: Difficult to collaborate between different domain teams. Even with the support from domains, effectively managing new releases of data products and communicating changes will be important.
- Consistent Data Standards: Ensuring data products created by domain teams meet global standards. Initially, this may be challenging as teams adopt the principle of domain ownership.
- Change in Data Management: Since every domain team has autonomy over the data products they develop, managing them and striking a balance between global and local standards can be tricky.
- Slow to Adopt Process with Cost & Risk: With data mesh, the number of roles (e.g., data product owner, data scientist, data engineer, etc.) in each domain increases. If an organization fails to establish well-defined roles and responsibilities, it can all lead to a mess.
How to Decide When You Should Use Data Mesh
Data mesh requires an organizational mindset change on how you manage data. However, every organization has different considerations for leveraging data mesh and a data mesh strategy can vary from organization to organization. Hence, when implementing this approach, it is essential to keep in mind certain factors such as the size of your organization, number of data sources, infrastructure limitations, etc.
When deciding if you should adapt data mesh, considering both the organizational and technological aspects is important. Here are a few considerations:
The Number of Data Sources
As an organization, if you have an increasing number of data sources, and you need to make sense of all the data, the data consumers (such as data scientists) struggle to get hold of the data within the defined SLA, thus impacting the company’s ability to make meaningful decisions. Bringing in a data mesh strategy in a scenario like this can reduce the time to insight process and let data consumers quickly analyze data.
The Number of Domains
For small and mid-sized organizations that have only a few business units when starting, are better off not decentralizing teams as it may lead to further data silos. Also, for smaller organizations, one central analytics team can efficiently serve the needs of the domains. However, with large enterprises, the central infrastructure team is often overburdened as the number of data producers and consumers increases. Data mesh can be an effective strategy for these larger organizations to manage and serve data.
Well-established data roles
Establishing well-defined roles and responsibilities is pivotal to succeeding with a data mesh strategy. If an organization's stakeholders (such as data product owners, engineers, and scientists) already understand their scope and duties well, it makes sense to adopt a data mesh.
Improved security for data products
A data mesh approach increases security for data products produced by a specific domain as the governance policies are additionally defined at a domain level on top of the global ones. So, as an organization, if you are looking at better security & governance models for your data, data mesh is the way.
Data mesh is a strategy that continuously evolves for an organization. Since the focus is on treating data as a product, it is understandable that a product will have new changes. It is, therefore, non-trivial to have a mature CI/CD process in place to incorporate those fast-moving changes. Hence, organizations with mature engineering practices can quickly adapt to these aspects and have an advantage in embracing data mesh.
Data mesh brings in a domain-driven paradigm to treat data as a product so it can be used across different business units via a self-serve infrastructure. By assigning the ownership of data to the teams who know and understand the business context of data, data mesh aims to make data readily available for data consumers.
If you are interested in understanding how a technical platform can support your organizational data mesh strategy, check out how Dremio's open lakehouse can help.