March 2, 2023

10:10 am - 10:40 am PST

Data Mesh in Action

This presentation focuses on the idea, applicability, limitations, and implementation advice related to data mesh, as presented in “Data Mesh in Action” by Manning Publications.

Topics Covered

Data Mesh

Sign up to watch all Subsurface 2023 sessions

Transcript

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Dr. Marian Siwiak:

Hello, I’m Dr. Marian Siwiak. I’m your overall data guy. For the last 15 years, I worked in data science even before it was a fancy buzzword. I believe personally in delivering value from data and understanding the data and scientific method in data. And this is an approach that led us actually to data mesh. We did different bits and bobs of data mesh then the one and only [inaudible] described the data mesh. And we had this aha moment. Yes, it actually connects all the dots. And we decided that we will write a book about the practicalities because in our consultancy data science consultancy, but also strategic management and process consultancy, we encountered many of the things that are described there.

Agenda

So we will talk about what data mesh is and is not. We will talk about the four principles of data mesh and the, let’s say, some practical aspects of it. And I will also tell you about kickstarting data mesh in one month. All this content is taken from the Data Mesh in Action book published by [inaudible] available on Amazon.

Overinflation of Data Mesh

So, let’s start. I would like to start with the sentence that I used to say when data mesh became really popular, that the silver bullets aren’t the same as data science is advertised, was not the single solution to all the business problems. The same data mesh is, data mesh is not. I love this approach. However, it’s not for everyone, and it’s not a purely technical solution. Data mesh is the decentralization paradigm in our view, and it’s a social technological architecture. It sounds pretty complicated and rightfully so because unfortunately it is. If you want to implement data mesh, you need to think carefully if it’s good for you. Because from our experience where data mesh really shines and really adds value, it’s when there are a set of conditions met. One is that there is a complex data needs meaning that people who use data, they use it in multiple different ways when the data sources are diverse, and there are a lot of different data sources. And when there is a high social technical complexity, meaning there are a lot of different data teams software, development teams and business teams.

If it sounds like a big corporation to you and not like Joe and Jane working in the garage. Well, in our experience for smaller companies and companies with simple data needs with a pretty limited number of data sources, data mesh may not be the best solution.

Do You Have a Data Mesh?

So the first thing that you need to know about the data mesh, it may not be for you. Now let’s talk about what it is and you will be able to decide if it’s really good for you or not. So maybe you have parts of data mesh in your company and you don’t even know it. And then if you want to transfer to fully blown data mesh everybody’s talking about, then you will need just to do a couple of adjustments, not build it from scratch. One thing that I would like to mention here, and I will mention it again, some of the [inaudible] during the panel a couple hours ago she said that she believes that the data mesh is well introduced in thin slices. So these are the pillars of data mesh and the pillars of data mesh shouldn’t be considered as little data meshes. You should take a little from each one.

Principle 1: Domain Ownership

So we’ll start with domain ownership. And I know that this is a bit of a problematic one because it requires a lot of cooperation between data and business worlds. But the main idea behind it is that you need to understand the business domain, where business domain is a very nebulous thing. I would say it’s not, there is no very strict definition. There is of course domain-driven design and methods of showing or discovering pro domain boundaries and setting boundaries for domains. But it’s just a bit of business that is doing a certain thing where you usually have a single data model, but it’s important that data mesh needs to be embedded into business operations.

It shouldn’t be a solution where you throw a little bit from this business part a little bit from that business side. Part of what we do and what we learned painfully is that if you want to connect data people with business people, it’s good when at least business people have the same understanding of the business, of the processes within the data. I will talk about it more when I will be talking about building an npp. Anyway it’s important to know that data should be owned where it’s being generated as close to source as possible, because otherwise, if you have a central team which collects the data from multiple different domains, there is no possible way that they will understand it well, whether they will describe it well, that this data will be available for everyone described. As well as if it’s described at the source where it’s being produced and by the people who live and breathe this data.

Principle 2: Data as a Product

The second principle is treating data as a product. When we were working on different types of elements of data mesh previously before [inaudible] described it in a concise way, we were using a lot of knowledge from product design when we were talking to people how they should present the data to ours. And it applies here. Data should be well described. It should have the information like any product, what it actually is. It should have reached metadata description. It should be available in this form or another meaning that it needs to be possible for people to get this product and use it as they see fit, or we’ll talk about it in a second. However allowed to do it. But it needs to be packaged neatly and it needs to be described and made available.

The next pillar. So the first pillar is that you need to have data which relates to a single business domain, meaning some area of concise business operations. Now, if you have this data and you have people in both technical and data and business people working together, they describe this data and they prepare a one need package which can be shared with others. Now, we go to a moment where someone needs to decide if this data can be shared freely. What are the access rights? Here we come to federated computational governance which means that there should be two levels or layers of governance. One is central where there are strategic decisions and policies and standards defined. Where there is someone with the view on all of the data and knowledge and support of people who know, I don’t know all the legal aspects of the cybersecurity aspects, people who can set up the standards in such a way with the company as a whole.

And the environment is safe and sound. But there should be a lot of operational freedom left within domains where people can decide for themselves how they want to describe this, the data. Not meaning that they can describe it as they want, that they will use their own field names like instead of time, they will call it, I don’t know, space but they need to be able to choose some tooling that they will use. They will decide on the data model that best fits the business. And this is the critical factor within the domain. They will understand how this data best suits and describes the business that they do, and when they describe it, they need to adhere to standards of description that are set up centrally.

Principle 3: Federated Computational Governance

Now, the fun thing about federated computational governance is that there is no silver bullet solution again. It’s very case by case. And working with different companies, we learned that different functions shifted from central to local governance units. However, by no means it’s something which is bounding anyone. So now we have data. We have a business domain. We have people from data and business worlds working together to describe the data. They decide on how they would like to expose pieces of this data to the rest of the company. And there is a governance, which tells them how this should be done in a way that different data products are comparable or connectable between each other.

Principle 4: The Self-Serve Data Platform

Now, to connect with different data products, there is needed something which is called the self-serve data platform, which is the place where different data products can be connected to, so they can be found from where they can be accessed, and where someone can… it’s the way to also to ensure that the policies are automatically forced.

Because in the previous slide we had with federated computational governance, I said something about federated, but the computational means that the execution of this policies should be enabled by this self-serve data platform. So data mesh in this view becomes a part where you have a domain or multiple domains, meaning different parts of businesses within each domain you have business and data people working together to prepare with data products, meaning the data structure or data set, which is exposed for other people to use. There are centrally set standards for how this data should be described. So it’s findable. And finally, we have self-serve data platform meaning technical solution, which is enabling this findability, accessibility and policy enforcement. So if you would like to build a data mesh for yourself, as I said, it’s a pretty complicated thing.

Kickstarting Data Mesh

It’s not even for everyone. You should evaluate if this solution is right for you. To do that, we propose going through a series of steps. First you should draw a landscape diagram. You should understand what systems and business operations are within your company, or if you are working on a single business unit, you should draw a system landscape diagram. I will in a second show how we go about all of the steps. You should analyze who are your stakeholders. You should be able to choose the right people, set up minimal governance, data products, and data platform. So the way we approach drawing a system landscape is mapping out the business processes, mapping out what data could be used to drive our informed decisions within this landscape map. And finally, what are the risks associated with making these decisions?

Stakeholder Analysis

The way that you will try to draw the landscape map may be multiple different ways. So now that you know the systems, you should think, who are your stakeholders? We propose using this interest and power chart. So thinking about who do you want to invite to your corporation or central governance for us. And we really encourage you to invite not only your friends and family and enthusiasts, but think about inviting, I don’t know, sub supporters. People who have high power, high interest, but are not really your friends. For two reasons. One, keep your friends close and enemies even closer. The second, they may be right, their distance or skepticism might be well justified. They can help you evaluate this solution. And also, if you show that their worries are not justified, you will have another great friend. Remember that during your stakeholder analysis, you are thinking about who can help you evaluate the system.

Choosing the Right People

Then you need to choose the right people. When it comes to choosing the development team there is, like with any other project, you want people who are known to deliver, people who are working on the systems that you decided will be good for your domain. Which is one thing that is pretty new here. They need to be experienced in working with business. You don’t want your MVP, your IT people and data people, they should be fluent in business, so to speak. You, on the other hand, on your governance team and the people from the business that you will cooperate with, it should be the people who understand what will be the value of this MVP. They should help you choose it. They should have skin in the game, so to speak, and they should help you evaluate it.

Set Up Minimal Governance

Then it comes setting up a minimal governance. We propose simple steps of defining value statement, which is the final priority for some actions decide on governance policy. I don’t know. We will use open standards. It can be a policy which will be important, or we will require this standard of describing the metadata. And in the end, divide somehow, split responsibility between data product owners and your central governance body.

Develop Minimal Data Products

Then you have developing minimal data products. Each of them should be focused on a single domain. When you have people who work with business, you will have it described with metadata, but it’s the idea should have clearly defined access ports. Again, metadata needs to be well defined before people start using free text to describe everything. And of course, you need to expose the data. Pretty simple. In the end, you need some sort of platform. In our book we describe git-based minimal platform. It’s just a place where you drop your data products, which are CSVs. The important part is, data mesh is not a technical solution. It’s a socio-technological architecture and a decentralization paradigm. So this is the content that we describe in our book. This is the code and if you would like to try the book this is the discount code which you will be able to access and use using the link provided.

header-bg