Data mesh is a decentralized approach to data management that focuses on domain-driven design (DDD). It aims to bring data closer to business units or domains, where people are responsible for generating, governing, and treating the data as a product.
A Data Mesh is an architectural approach to designing data-driven applications. It provides a way to decouple data services from the applications that use them, enabling teams to own and manage their data domains independently. The objective of a Data Mesh is to build a scalable, secure, and reliable data infrastructure that supports the needs of multiple teams and applications.
The Four Principles of Data Mesh
Data mesh was introduced by Zhamak Dehghani and is built on four principles: domain ownership, data as a product, self-service data platform, and federated computation governance. The first two principles emphasize an organizational mindset to treat data as a first-class product owned by individual teams. The second two principles focus on the elements of the technical foundation needed to achieve this new approach to data.
Figure 2: The four pillars of data mesh
1. Domain Ownership:
Decentralization is the core of a data mesh approach. Here, this refers to the decentralization of business units/domains rather than technology or infrastructure. In a data mesh, an individual domain takes full ownership of its data from end to end, ensures that the data is trustworthy (high quality), has a domain-specific context, and is consumable by other domains within the organization.
One of the challenges of a traditional data ecosystem is that there is no real ownership of the data itself. For example, how do you make data self-describing and ensure it is of the highest quality or trustworthy? Also, over time, central data engineering teams become a bottleneck as the need to make data available to consumers increases. In a data mesh, domain teams are responsible for data creation, ingestion, preparation, and making the data available. Federated ownership by domain helps maintain the business context of data (domains know their data very well), and the responsibility to make data available to the consumer shifts away from the central infrastructure team.
2. Data as a product:
The second principle centers around treating data as a product rather than just an asset in an organization. This works in conjunction with distributed domain ownership of data. Now that each domain owns its data and is responsible for producing and catering data to its consumers, it is expected to be high-quality, fresh, and trustworthy. Most importantly, it addresses a critical problem related to the previous approach — enabling data interoperability across domains.
Having an organizational mindset that the data generated by one domain can be used by another is pivotal in treating data as the primary product. Like with any other product, this approach lets you think from the consumers’ point of view and ensures you put quality first and address the customers’ requirements (in this case, data consumers in other domains).
3. Self-service data platform:
Data teams need a platform to build domain-specific data products and serve those data products across business units in a self-sufficient way. However, to allow domain teams (engineers, domain experts/owners) to have a complete focus on developing quality data products, it is essential to abstract the infrastructure to facilitate self-service.
In a data mesh, a centralized infrastructure team provides a common platform with the tools and services needed for computing, storage, and service of data products that work irrespective of domains. Then, each domain can calibrate the infrastructure and tools per their requirements and the data products they build. This allows domains to successfully own data and products and lets the central infrastructure teams focus entirely on improving the platform instead of managing ETL/ELT flows and responding to constant requests to create new datasets.
4. Federated computational governance:
The final data mesh principle aims to support all three principles discussed above by letting each domain exercise governance over the data products they build locally. However, domains must still adhere to standard rules that the organization has decided upon globally. This is important, particularly with a decentralized approach to run the ecosystem in harmony and achieve data interoperability. Ultimately, this model aims to have a strong collaboration between the local domain and the global governance team to cater to all the data needs.
Why Use a Data Mesh?
There are several reasons why an organization should consider using a Data Mesh. First, it provides a way to manage data at scale, ensuring that data is organized, governed, and secure across multiple teams and applications. This is particularly important for organizations that rely on large amounts of data and need to ensure that their data infrastructure can support their evolving business needs. Additionally, a Data Mesh helps to align data with business goals. By ensuring that data is governed and managed to meet the needs of the business and its customers, organizations can build a strong foundation for their data-driven initiatives and drive innovation and growth through their data assets. This helps organizations extract maximum value from their data and achieve their business objectives.
Benefits of Data Mesh
Data mesh improves how you manage data and make it available across an organization by focusing on domain decentralization. An efficient data mesh implementation can provide you with some very notable benefits:
- Easier & Faster Access to Data: Data consumers (analysts/scientists) have the data available to them, which reduces the time to insight and allows businesses to make faster decisions.
- Flexibility & Independence: Gives ownership and autonomy of data to teams that know the data best.
- Standardized Data Observability: Explicitly prioritizes treating data as a product which helps to establish a data-driven culture.
- Business Agility & Scalability: Reduces overhead on central data infrastructure teams, who can now focus solely on improving the platform.
- Improved Data Security: Each domain is responsible for defining their own security & governance policies while adhering to the globally defined ones to make data discoverable. This results in improved security for the data products.
Data Mesh Vs. Other Data Architectures
Traditional data architectures often create a gap between the data producers and consumers, which leads to the original meaning of data being lost. It is, however, imperative to have the domain context in the data for effective decision-making. But, more importantly, we don't treat data as first-class citizens in the current approach. Hence, stakeholders have no actual ownership of the data, ultimately impacting the infrastructure team and consumers.
For instance, with centralized data architecture, organizations will use a data warehouse or data lake to centrally store sales, marketing, and HR data. Then, data engineers in IT have to make the data available to various departments and data consumers through dataset copies made via ETL pipelines. Unfortunately, this traditional structure creates a bottleneck that data consumers must go through to access data which is both difficult and time-consuming for everyone involved.
Data mesh assigns the ownership of analytical data to each domain in contrast to building one monolithic platform with each domain's data managed centrally by IT that serves all the organization's analytical needs using this centralized platform. It aims to solve these centralized data architecture problems from an organizational and technological standpoint by shifting the responsibility to individual domains for their data creation, transformation, and availability.