
Gnarly Data Waves
Episode 51
|
June 13, 2024
Scania’s Journey in Navigating and Implementing Data Mesh
Learn about Scania’s journey implementing a data mesh to improve delivery and business outcomes. Highlights on architecture, prioritization, platform selection, and data culture transformations will be shared. Included will be Dremio’s Field CDO and Agile Lab Co-Founder to share additional lessons learned via other global enterprises on similar journeys.
As the demand for data analytics grows, and with a decentralized approach at its core, Major Swedish manufacturer Scania needed to balance domain autonomy and alignment, while implementing a self-serve data & governance platform, coupled with a unified way of accessing data.
Discover how Scania addressed these challenges by adopting a data mesh strategy, and how using Dremio and Witboost has facilitated their journey. Learn about the cultural shifts, changes, and partnerships that are driving tangible business impacts. Additionally, gain insights and trends from Dremio’s Field CDO and the co-founder and CTO Witboost.
Watch or listen on your favorite platform
Register to view episode
Speakers

August Johnson
Since joining Scania in the fall of 2022, August Johnson is the Strategy Manager for Scania’s transition into Data Mesh. His work covers setting the strategy and direction, decision preparation, planning the transformation, and working with internal stakeholders.

Paolo Platter
Paolo Platter is the CTO and co-founder of Agile Lab, he is working in the Big Data field since 2013 and always trying to push technologies over the limits.
He has a real passion for impossible challenges and new paradigms. Currently, he is leading a team of 60 Data Engineers.

Nik Acheson
Nik Acheson is a senior product & strategy leader at Dremio. He has delivered digital and technology transformations at massive scale at companies such as Nike, Zendesk, AEO, Philips, and the NSA. Prior to joining Dremio, he was the CDO at Okera (Acquired by Databricks).
Transcript
Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.
Opening
Alex Merced:
Hey, everybody! This is Alex Merced, your host of Gnarly Data Waves, presented to you by Dremio, and today, on Episode 51, we'll be talking about Scania's journey in navigating and implementing a data mesh. But first, a few quick announcements before we get started, of course, I always like to introduce you to the Dremio Lakehouse platform, again, bring[ing] you great value in helping you implement the data lakehouse, whether it's unified analytics connecting your data across data lakes, data warehouses, and databases, an SQL query engine that can federate queries across all those sources, and lakehouse management features to make implementing a data lakehouse and using a data lakehouse as easy as a data warehouse.
And if you want to get hands-on, I highly recommend doing this exercise, using this QR code that's right here on the left. We'll also make sure they include a link to that in the show notes when this is posted on YouTube later on. But with that, let's get started with today's adventure, “Episode 51: Scania’s Journey in Navigating and Implementing Data Mesh.” We're going to be joined by August Johnson, strategy manager at Scania, Paolo Platter, CTO, co-founder over there at Whitboost, and Nik Acheson, Field CTO here at Dremio, but with no further ado, guys, the stage is yours.
Introductions
Nik Acheson:
Alright! Let's do some quick intros before we get into August’s slide. August, do you want to start? And then Paolo, and then I think I'll be mine will be very quick because I think most of the folks know me here at Dremio.
August Johnson:
Thanks so much, Nik, so yeah. My name is August Johnson. I'm the strategy manager for Scania, for the data mesh journey, so what I help do is to help set the strategy and direction for the initiative, alongside interacting with our internal stakeholders, and marketing our idea and vision overall.
Nik Acheson:
Awesome. Paolo.
Paolo Platter:
I'm Paolo Platter, CTO and co-founder of Agile Lab, and [for] three years we’ve been working on Whitboost, our experience platform, and we support Scania in this data mesh journey.
Nik Acheson:
Awesome, and I think many of you know me, so my fast introduction is I’m the field CTO here over at Dremio. But the cool part of my job is that I spend about half of every single week normally on the road with our actual customers or prospects, making these things a reality, so helping them design a maturity path because that's where most of my career has been, on your side of the table. So August, happy to share some of my bruises and lessons learned, and talk about where you're at in your journey as well. But without further ado, I think you're the rock star here for the first half of the show, so if you want to share your experience, your journey, we're here to learn from you first, and then we'll move into more of a fireside chat.
August Johnson:
Thanks so much. Then I'll proceed to share my screen––here we go. So I hope everyone can see the screen.
Nik Acheson:
Yup, we got it.
Data Mesh @ Scania; Why Data Mesh?
August Johnson:
Great, awesome, so I thought I'd spend these initial couple of minutes talking quite a bit about data mesh, and why we decided to opt for this journey at Scania. So one of the key things that made us go into this journey, is that we, alongside probably many others, saw an increase in demand for qualitative and available analytical data. In conjunction with all of this, we also had Scania's cloud journey going on at the same time, whereas, [as] a result of Scania's decentralized nature, we saw different parts of the company move to different cloud providers based on their needs and what works for them. So with this increasing demand and change, we had to find some way to make it as easy as possible for analytics teams to be able to find, understand, and get a hold of the data they need to do their job, and in doing this, also embracing these data-producing domains to take ownership of their data, in an ease of compliance manner.
Key Decisions
August Johnson:
So what were the key decisions that we made in our journey? Well, the first one, and arguably the most important part, was to opt for a so-called tech-agnostic marketplace, with a highly customizable governance engine, with the latter being key, in my opinion, as it allowed us to create tailor-made policies that interact with the vast array of no-information security tools that we have at Scania. By having that, it helps decouple our dependency on specific providers enabling data mesh to be maintained and respond to changes and trends that occur over time.
Now, the second part that we made sure to do early on in our journey was to identify a set of key domains based on potential impact, considering their consumer base and their maturity levels, and this was something that was proven to be both for the better, and for the worse at times, as we did have quite a few insightful workshops together regarding both, in terms of alignment on principles and data product development. However, one key thing that we learned here was also the importance of, especially when you're early in your journey, balancing the development speed you have and what you're projecting, with the promises that are made and the discussions that are done with these domains, as otherwise, it's easy, if these promises aren't delivered, things might get forgotten with time. But it's also something that has helped us greatly and helped out with the adoption now that we've become more mature.
Now, the third thing that was key for us, was to standardize on certain global patterns while respecting domain autonomy, because keep in mind, that these different domains are on different cloud platforms, and we want to embrace them, to continue to do what they're good at. But by setting certain standards, such as the adoption of Iceberg, or in some cases, Delta, joined by a unified access layer, which in Scania’s case is Dremio, we would enable these different cloud platforms to coexist, while also having minimum friction for the domains to get on board.
And then in our journey as well, we made sure to collaborate with other initiatives as well [to support] each other. And a great example of this is the tight collaboration we have with the Cloud Lake squad. Now for some listeners, you might think, yeah, a data lake is an anti-pattern to data mesh, and we're aware of that. However, the way we approached things was the outcome of this collaboration has been that now all Cloud Lake sources that reside in AWS are automatically available in Dremio with, all the privileges and accesses inherited. And what was key with this is that all these different sources that are in the Cloud Lake, have respective information owners in the business, and what we did was to reach out to these different owners, and we're currently doing that, and saying, hey, so you now have your raw data in Dremio. You have a range of consumers accessing and working with it. Let's look into how we can build a data product out of this. We have this cool self-serve platform known as Whitboost, and we can make things discoverable, and you can get your data contracts in there, and so on. So that is how we have been approaching that side of collaboration. In addition, we've also been collaborating with the upcoming data catalog team, and the data quality tool, and what this also helps with, is that you get the word out on multiple fronts at once, and this is something that has helped out quite a bit in our journey.
Enabling Adoption
August Johnson:
As I mentioned before, Scania is decentralized. We have many parts of business that have already taken ownership, [and] different maturity levels, some are very mature, and they're on a tech stack that suits their needs and expertise. Of course, we want to embrace this but we also want to make sure that the data, [which] in turn, will turn into data products, are discoverable and compliant, powered by Whitboost, of course, but also accessible in a unified way through Dremio. So some of the things we did at the start of our journey [was] to be very proactive in how we marketed a data mesh as a whole, and of course, having an established platform team that works with both the development, but also quite a bit on the promotion side, acting as a center of enablement. But one thing that I'd like to highlight in this is, and learning from our journey, to take careful consideration into how you position data mesh when you're describing and presenting it, because quite a few times when we have interacted with domains, it has turned into a perception of platform dilution, if you will, in which they go and say, hey, but we're on Cloud Provider X, and we've heard about this and that, so why are you introducing two other platforms that do the same thing? And if you don't position things in a good way, you tend to lose the social part of the data mesh in it, and it becomes a discussion around technologies. I saw Nik nodding there as well. Maybe that was something you yon sympathize with.
Nik Acheson:
I have this conversation every single week. And I think, and as you're highlighting, too, the patterns become super important, and if I can even pull your thread a little bit, you mentioned the tech collaboration and synergies. But the other side that you need to know is the business side. Maybe you can speak a little bit of that to that part because I know in my history, I've had business users go: I don't care if it's data mesh, I don't care if it’s centralized, non-centralized, how are you gonna help me on the marketing side?
August Johnson:
Yes, and that's a very key thing and is actually a point that I have later on in the presentation, and that is to base things on the direct impact that it would have on the business. So an example of this is a domain we're collaborating with right now that reached out to us, and they said, hey, we're on cloud provider X, and we have a ton of consumers on different Scania markets, they're not in the cloud, they don't know what this is, but they want to build their BI reports. And this is where it's super important that you minimize the barriers to entry, so what we did was we had all the pipelines in place to set up Dremio for them. It took roughly 40 minutes, and then it's super straightforward for the consumers to start consuming, and once you show that value, that's when we went, hey, and Witboost is great, because there in the future, when consumers come in, they can find your data, and read about it, you have data contracts in place. And this, again, isn't a big hurdle to take, so this is being a bit dynamic, based on the domain’s direct needs, because that's one of the challenges as well as, having proper producer-side incentives, because it's very easy to incentivize consumers into the paradigm, because they get: oh, it's describable. Things are in one place, data products [and] data quality is hopefully better, and so on, so forth, and yet you can see that, they're compliant with the policies. But for the producers, and that isn't as clear, so yeah. Definitely.
Nik Acheson:
Yeah, so that's one thing I always love to highlight, and Paolo, you might even want to jump on this one, too, is like when I run data analytics platforms, and a team that, frankly, isn't prioritized, is coming to ask me for data, I'm like, well, yeah, I'll see you in 9 months, maybe. So you just said 40 minutes, like, I just want to underscore that like pretty heavily, and not only that, showing them also how to do that themselves, too, in an easier way. So I guess from your side: does that normally bring an ‘aha’ moment? Or is that the first step in your marketing campaign with those business users, what is that part of that journey?
August Johnson:
Exactly, so, it does bring the ‘aha’ moment especially when they see the advantages of Dremio, and not only in terms of a unified access layer, but you've got some pretty good ways of setting privileges on the users, you have a good audit, and so on, and also lineage, which is something that has been brought up a million times at Scania. So yeah, it’s both the ‘aha’ and then their understanding of what the bigger picture is.
Paolo Platter:
You touched on a very important point––to gain adoption in a data mesh environment where different departments need to take ownership in producing data in a certain way, it's super important to provide value through the platform so that they are incentivized to jump into the platform. There is no way I saw it working with just top-down enforcement. You need to use this platform, because it's like that, because it's an architectural decision, so the platform itself, needs to provide some value, and typically the value is productivity and the easiness to achieve achieving results.
August Johnson:
Exactly. Yeah, and we have discussed this internally within our team as well, regarding, bringing value, because the full setup Witboost, Dremio, and achieving what we perceive as the ideal state of data mesh, that brings massive value to Scania, but in local terms for the domains that might have some limited resources, and so on, that isn't as clear, so it is if we try to segment things and say, hey, so this is what will bring direct value to you, and then try to make it as easy as possible for them to get into the full pattern. Then that is a good way to go.
Yes, but perhaps we move on a bit on the slides, so yeah, proactive marketing on various levels, positioning data mesh, [being] transparent in where you are as a platform team in your journey. That's something we learned fairly early on with that, there's a tendency of, and because we did get quite a bit of––how do you say it? Wide responses and a lot of people started talking about it, but that also led to differing perceptions of what data mesh is, some saw it as just a marketplace with all the data. Others saw it as, something that would solve everything, whether that is data lineage and all that stuff, and in [some] ways, that was pretty good, because that got the domains to start to talk, and a lot of implied marketing happened on the grassroots level and in the decision-maker level as well. But that's also one thing that I wanted to highlight.
And then we talked a bit about having patterns in place, and for us, to have that before making, commitments early on, because it took quite a bit of time for us to set up our handbook and set up the clear processes for a domain to be onboarded, so that's something I also want to want to highlight in this. And then minimize barriers to entry for these business domains, so what we did was to do a step-wise onboarding, based on, as we just discussed, the immediate value-add and the domain-specific situation. But the third thing is to have a clear definition of ‘done.’ So what we did was to, and we're currently doing this, by the way, but we're currently onboarding three key domains into our setup that will act as the champions or best practice domains with hopefully, at more or less, a hundred percent compliant data products. However, in terms of clear definitions of done, I think it can be good to view things as, it’s better to have a filled shelf with 65% compliant data products that you manage to on board with minimum friction, over having a scarce shelf of a hundred percent compliant data products, and the others see it as a massive barrier, so they don't dare to hop on the journey. This is again connected to some of the collaborations we have with the cloud lake, helping out the information owners there [and] keeping things simple, we have these best practice domains in there as well, because these less compliant ones can always be improved on over time.
Nik Acheson:
Yeah, and I know we have the fireside chat coming, but I'm curious [about] this one––we often talk about ‘the facts problem’. And you're alluding to this a little bit––if you force the 100% facts into the domain, it's great, you can now trust what's in there. But the other side of it is, that you still don't have spectrum-level visibility of your data across the enterprise, and what's being used. How did you and the team navigate where you don't want to make sure there's a whole lot of garbage in here and lower down facts, while still managing to make sure that we know what's happening inside the business? We know what assets are being used, and that's a way we can identify where we need facts, so where are you at in that journey, how did you…?
August Johnson:
I would say that we're pretty early on in that journey. We productionized our set up less than a month ago. But the approach we are taking is to keep it to the essentials, so when we adopt and help onboard a less mature domain, we make sure and help them in getting their tables into the Iceberg format making it available in Dremio. Again, going back to the Cloud Lake collaboration, and thereafter, really helping them and co-creating the data product together with them, so that it’s at least describable in a good way. They have a basic data contract in there, and they adhere to the internal policies that we have at Scania. And if we can achieve having the unified access layer with a global output port, as we call it, and there's a person, and maybe some engineer in there, who can help maintain the product over time with a bit of our help, maybe, that then we're happy. So when I talk about 65-compliant products, that's really what I mean.
Paolo Platter:
Also, these computational policies are playing a huge role, so the possibility to start with a certain level of compliance and governance, and step by step, I mean, while the adoption is increasing the level of governance and the level of control that the platform team can achieve. So, for example, in Whitboost, having all the deployments and all the computational policies in one single place, helps to gain an end-to-end perspective of what is happening across all the domains.
August Johnson:
Absolutely, and it also becomes clear for the domains what it is that they don't satisfy, so it will still be that when they build and define their product in Whitboost and attach the out-report to Dremio, and describe things that they will see a list of basically like, oh, here's a policy that we didn't pass, so we won't be able to publish it in the production environment, and that's where we are to help smooth things out and make them compliant.
Yes, and another thing I just wanted to highlight is, of course, to respect the platform choice of the domains, and to minimize perceived enforcement from the platform-team side because it is about domain ownership, and if they have expertise in the environment they have decided on, then that should, absolutely be embraced and kept in that way. But, having global patterns and settings, which we did to set Iceberg and Delta to an extent as the de facto way of sharing data for analytics, outside of data mesh, just for analytics in general, was a big step on the way in helping to balance some alignment with what we want to do, but also have the domains, continue to do the work they're doing.
Yes, and this one––balancing adoption speed, and compliance with data product principles––this is what we discussed already when we talked about 100% compliant data products versus 65%, barriers to entry, and so on, I'll keep it as it is here. Important point. But one thing that is good to highlight and market internally [is] these success stories around what you're trying to do. So what we have done, for example, one of the domains that we started collaborating with early on had several presentations within Scania to highlight their data mesh journey, and so on, and it helped to keep the talk going, and for the concept to be spread across the company.
Scania’s Platform Set-up
August Johnson:
So I thought I'd now try to make things a bit more tangible, so that we have a glance at the flow, and how the platforms, if you will, are set up. So as described, we're in a multi-cloud environment, and we are onboarding domains with clear patterns, with a handbook, and with onboarding sessions to take responsibility and ownership over their data and turn them into products. Make sure that the open formats are adopted, and thereafter we have, with the help of Whitboost, a data product builder, with a provided toolbox in which they can start defining and describing their data product. And in there they'll test it against the policies that are in place, so the custom Scania-made policies, to see if they're compliant, and once they are––that is when the data product is published into the marketplace, and consumers can request access through role-based access control. Their data contracts are in place, and there we have Dremio as the common or default sharing pattern to embrace data democratization if I am to use a buzzword.
Nik Acheson:
So I'd be remiss––but I also want to take a shot at myself for asking this question. In the the speed [at which] we're seeing movements, whether it's RAG, LLM, GenAI, and so forth, I'm trying not to say, AI/ML, because I may argue, that's not AI/ML! How are you thinking about this architecture continuing to enable the speed and keeping up with those other capabilities? How are you marketing that as you're getting pressure to consider these other patterns? So I'm just curious: what does that storytelling look like for you and the team, as these other things are coming in?
August Johnson:
Yeah, so actually, fairly early on in our journey, before we had set all the things in place, we talked a lot about data mesh democratizing data. By having clear ownership, the data product owners would start treating data as product-based and tailored to their consumers, meaning that if you have machine learning engineers who are training and building machine learning models, that would probably be raw data that is untouched. But the key here is that everything would exist in one catalog where things are clearly described, so when you have someone like a data scientist or a machine learning engineer with a set use case, they can easily grasp what the product is about and the cases it can be used for, and by doing that in the long term we help to, reduce any inefficiencies, and how data is accessed and worked with and in turn, then help scale, and make––Scania, if you want to use AI or machine learning––that space more efficient.
Paolo Platter:
There is also a question from Edward––how did you implement policies and enforce them? For example, retention time of 12 months.
August Johnson:
Yeah, so the policies, if I recall correctly, are written in Python and QLang, and they integrate with the tools that we have. So if I am a producer and I've defined and built my data product, I've described it––so you as you can see on the right-hand side, what happens there is that Whitboost goes into the internal tool and checks if certain aspects are satisfied, and thereafter returns, this is okay, and you can publish it. So that is the gist of it.
Paolo Platter:
Yes, and we have deploy-time policies and runtime policies, so, for example, the retention time is something that is going to be checked at runtime. You have this in the slide, yeah you see. So it's a 2-phase compliance check at deployment time to avoid people going into production with something that is not standardized and compliant, and at the runtime, to verify that every promise made by the data producer is still a month in it.
August Johnson:
Exactly, so we have our runtime policy scheduled for once a week. [This means] that if things are inconsistent with, for example, the data contract––maybe it is the freshness of the data––it can go in and see when the data was refreshed last time in Dremio, and then push that into Whitboost and compare that against what the producers have promised.
Lessons Learned
August Johnson:
Yes, so now I think we're onto the final slide, and I believe we have discussed a bit of this already, such as stakeholder management, how you incentivize change to happen, the scopes, [and] managing the expectations early. Yes. Maybe the point of being mindful around initiatives that might have overlapping goals, I think the approach we took there, at least I mentioned that as well, that is to collaborate over competing. But yeah, do we have any thoughts from Paolo or Nik's side?
Fireside Chat/Q&A with August, Paolo, and Nik
Nik Acheson:
Yeah, I have one go-back, because back to the facts question, I think you mentioned early on the auto-registration tied––so if I'm in Snowflake, and I'm creating an asset in Snowflake, one thing I used to pitch it in when I did my road shows was compliance-as-code, So if you’re creating these environments or you're creating these assets, make it very easy for you as a user to be able to, even as an enterprise, know that these have been created. But then you, as a producer, to be able to quickly put those into a space for others to be able to access, utilize feedback, and so forth. I guess that's maybe the second part of the previous question, which is, how you're balancing that, and whether you're looking at it from a governance lens. Or if you're looking for that for ease of productionalization, and where you're at in that journey.
August Johnson:
So you mentioned compliance-as-code. Are you thinking more on the governance side?
Nik Acheson:
So maybe I'll give you an example from my own, so I'll just say, at one enterprise I previously worked at, you could go into a Snowflake environment. For example, provision your environment, tun your jobs, and then export that, or use that data however you'd like to shut it down. And unfortunately, the company I was at at the time––a major company, Fortune 100, had no visibility to that at the enterprise. We didn't know what was even happening in there. There are other platforms––and granted we worked with Snowflake to help fix that, while also on our side, implementing what we call compliance-as-code, so whether you spun up an S3 bucket, you spun up a Snowflake environment, or otherwise, we auto-registered it into the catalog, or we minimally had visibility to people creating these assets. The second part of that is when I heard you, which is, as you created these, now we can help easily register these without having to do that [manually.] How much automation are you bringing back even to that first part––where I experience some of my pain early on because I know there are still platforms out there––I'm trying not to throw shade!––that still has that problem Especially when you're in a highly distributed architecture.
August Johnson:
Yeah, I think I got your question. I'll try to answer to the best of my abilities. So, anyway, in terms of governance-as-code, we segmented things into two parts. You have global policies that interact on the data product level, so this is exactly what I described earlier with having integration with our internal tools, and then we have the local policies which the domains themselves define. But that is regular row-column-level access policies and masking policies. But from what you described, that's not something that we have thought about, I believe.
Nik Acheson:
Cool. And then there is a question in here, one about Dremio. I know you mentioned you are early on in your journey. But if maybe I can even add to it, I think part of this is, what are some of the metrics that you're using today, and then, more specifically, I think the heart of the question is: how are you driving improvements looking at those metrics, capturing those winds to help drive continuous adoption growth and maturity? As you're in these early days, what are you tracking? And then how are you making sure that you're growing appropriately? But also, I think this goes back to your definition of ‘done’, being able to move and mature the pattern, like any startup, right?
August Johnson:
Yeah, exactly. I think the main thing that we're tracking right now is domain adoption. Before, it was also the speed [at] which we managed to introduce new features into the platform as well. But now that we have productionized things, it's completely focused on domain adoption and the rate at which we do that. So the current setup right now is that we help adopt three or four domains per month, setting aside roughly 3 hours per domain to get the full flow with Dremio and Whitboost and the data product creation. So for us, it's completely focused on adoption rate at the moment, I would say, and then also––
Nik Acheson:
I was going to say, I have one quicker follow-up there, because I know a couple of our prospects today, for, I'll say a couple of quarters, they've been trying to justify bringing Dremio in, for example, and Whitboost and another one of the examples with us. Where they're stuck is helping justify the business case. And I'm curious: what did that process look like early on for Scania to be able to go, this is where we want to go. But here's the impact of it––we're all stuck in this macroeconomic environment, where you have to consider switching costs and so forth, so how did you present that? And how did you build a business case to justify changing these patterns and investing in a team?
August Johnson:
Yeah. There are a couple of things. So I know Paolo, our lead architect, did some really good investigations together with our lead data engineer into the cost, and then, of course, the potential of having a query federation layer, a unified access layer. And what we projected toward the business was basically, what would the alternative be if we didn't have something like this? Because we have Snowflake, we have domains on Azure, we have domains in AWS, we have a cloud lake in AWS, and then we have the other guys on SAP. And what would the alternative be if we had the ambition to make analytics more available to the end consumers? And [what] would it look like for someone in Scania, Brazil, for example, if they'd like to create some BI reports on data that is scattered around multiple platforms? Where would they find it? How would they find the documentation around it? How would they understand who owns it? And then, finally, how would they access and start working with the data? And that's the main point that we saw in adopting the Dremio Plus-Whitboost setup. And then there were some bonuses in there as well, being that if you adopt Dremio with having your data in AWS, the costs look very, very good compared to other alternative approaches. So I would say, that is essentially how we approach that.
Nik Acheson:
Did you have to put metrics against that? Or is it more of the storytelling, and you were able to drive adoption?
August Johnson:
I would say, was 80% storytelling.
Nik Acheson:
Awesome. I always tell people that you can't discount the storytelling. You always have the numbers behind you, but it's driving the change. And people are harder to change than technology.
August Johnson:
Yup!
Nik Acheson:
Go ahead, Paolo. I know you have a question.
Paolo Platter:
Yeah, and before [you talk] about being open and vendor-agnostic, open standards, what was the reasoning behind this choice? You talk about the Iceberg, and we all have seen what is happening around Iceberg.
August Johnson:
Definitely. That was arguably one of the key and most impactful decisions we took to standardize open formats. We did see the trend go that way, and we've now been reminded that that was a pretty good investment to do, given Snowflake with a Polaris catalog, all the partnerships, and so on that's going on. And at the end of the day, I think the main driver was that we value the separation between storage and computing. We didn't want the pre-producers to have scaled costs with the number of consumers that they have. This is all by default available, and, for example, Snowflake, if you're using their native tables. But if we were to introduce a unified access layer such as Dremio, we need to adopt open formats to have that continue to be the case. So by doing that, there isn't much effort that needs to be done on the domain side, and we can achieve separation between storage and compute at the same time.
Nik Acheson:
I'm curious, there are a lot of cool lessons learned here. Maybe if I flip it [to] the other side, what were the hard lessons learned? What would you do differently, even though you're in the early days?
August Johnson:
Yes, I would say that the hardest lessons learned, or the biggest challenge that we stumbled across, I alluded to it a bit, but it's a bit difficult to describe. But essentially, when we started marketing data mesh, roughly, a bit over a year ago, within Scania, a lot of expectations started to happen, and in conjunction, we were collaborating with the key domains I talked about earlier. However, when we did workshops with them and introduced these early versions of our implementation, things weren't exactly as they expected, meaning that some of them lost interest and dropped off, we even had some instances where they called all of this a development project that would fizzle away over time. So that was a big lesson learned from our side, that, be sure to have the patterns in place, walk the talk, and manage expectations. It's good that the domains––even though they might have differing impressions of what data mesh is, on a positive side, and so on––[you must be] transparent about how things are going, and when they can expect to be onboarded or to be able to be a part of the mesh, so to speak.
Nik Acheson:
So I have a friend right now I was talking to last week, who, [has] a, we'll just say, a massively legacy home-built small startup, hasn't made too many updates to it. The platform wasn't working for them. They got pushed to look at we'll just say an alternative platform that won't fit the future state for them, but they've been through the POC, and one of the coolest problems to have is the business is ready to go. So what they're trying to do is justify effectively slowing down for a month or two, proving out, we'll just say, a better pattern with Dremio. How would you advise them where they have the easiest problem of getting business adoption, or when I say they solved the hard problem by doing that, but being able to go, well, if we slow down a little bit, we can go much faster and farther, intentionally? I'm curious, like your perspective there.
August Johnson:
I would say, I mean, we've been in a similar situation like this as well. I would say that what we did was to keep things very honest, and have Paul, who's our lead architect, describe what the situation is right now, and why we need to have a bit more time. And what happened was that we did get quite a bit of sympathy for it, and they understood that we should do things the right way, that will help us in the long term than go with something that could potentially be a bit burdensome further down the line, but good in the short term, so to speak.
Nik Acheson:
Yeah, my fastest feedback to him was that storytelling matters, and talk about what's possible. So you can get to where your problems are today, but how do you unlock the possibility of tomorrow and even bring that to them right back to me, even the AI/ML side, is the engine that you're gonna need later different than the engine you need now.
August Johnson:
Yes, and what we do is to exemplify a lot of that, by basically telling us, imagine a BI consumer in market A, that is just looking for this data product, in this case, with data mesh X, Y, and Z, unified access layer, like that stuff, but keeping it simple and also simplifying things based on who you're talking to.
Nik Acheson:
I know I've been dominating—Paolo, do you have some over there? Sorry, and if there are any other questions too, feel free to use the chat box, everybody.
Paolo Platter:
Yeah, you’re talking about change management and stakeholder management, so now that you are entering a new phase, after managing expectations, after delivering the platform, after enabling the onboarding domain, now, how is changing the relationship between IT and the business, and all the other stakeholders involved in a data mesh?
August Johnson:
Yeah, it's a great question, so I would say, as mentioned in the beginning, it was a lot about balancing the expectations, not over-promising, and also going through the rollercoaster of them being disappointed, but further down the line, happy, and so on, so forth. So now it's pretty much proactively reaching out to these different domains that we've worked with and built relationships with historically, and when now that we have these set patterns in place, they're pretty happy with what they see, and the response is pretty positive, so I would say, right now, the focus is to continue to do that, while iteratively improving on the platform. But yeah, that's how the ways of working are right now.
Paolo Platter:
And how do you set the roadmap of this platform? Are you creating some product-thinking for the platform itself?
August Johnson:
Yeah, so in the roadmap, we have shifted the focus from features to be implemented to having separate lanes for domain adoption joined by certain things that will add on over time. And since some of these domains work in increments, as it's called––so every 10 weeks they have a lot of heavy planning, and then they have a set of goals that they need to achieve––we're adapting to that as well, so that we can help plan the onboarding, and have that roadmap with these different domains to be onboarded.
Nik Acheson:
Now, as part of the domains, how did you do prioritization even within the domains? Because my experience is classic, it’s one of 2 ways, you want to always jump on the other side, which is like, I just tell people you don't know where to start, start in finance! There [are] very clear people who own that, there have to be facts in there and bring that maturity. And oh, by the way, everyone knows that one or two people's names own and run all the underlying data for that domain, and then you can deal with the bounded context and continue to mature out in other domains. The other [thing] is getting early-on leadership first and then clearly being able to track some of these business deliveries, where you have that storytelling from too. Normally would take you six to nine months to set these domains up, or by waiting for the central team, so if we go after this initiative, we can deliver it nine months faster with this new pattern. And here's the impact on the business, so usually you start in one and start popping over the other to drive more expansion and alignments. So I'm curious, from the early days to now, where are you and that journey, and advice that you share with others?
August Johnson:
Yeah, so in terms of how we selected the domains, we set it up as you have a couple of primary domains, so you've got like sales and marketing, you have finance, you have connected vehicles, for example, and we try to contact some of the more impactful domains with subdomains within that area, and so these were the set of domains that we work with early on, and thereafter start helping them in understanding what data mesh is, so that the word could go out, and then working with and introducing them to what we have now. So I would say this selection choice, was very much around the impact, number of consumers, good use cases that could exemplify a lot of what we want to achieve, and so on.
Nik Acheson:
So if I’ve heard that right, there were the functional domains, which sometimes could be organizational or functional, but then there are the microdomains underneath that, such as a connected vehicle––
August Johnson:
Yeah.
Nik Acheson:
––that could cut across with bounded context into multiple other domains. So I guess back to domain flexibility and even domain architecture, how did that process look like for you, and are you still in it?
August Johnson:
Yeah, we try to keep things in terms of how the domain map would look like and the subdomains within it, we try to keep things fairly straightforward and understandable because it's easy to get tunnel vision on the theory. But what we did was that on a TRATON level, so basically, the holding company with Scania, and MAN, and Navistar, and so on, and they set a global domain map, and that should be common between the brands, and we made use of that to define the primary business domains, and then when we talked to different groups within R&D, or within sales and marketing, we position them within that as a subdomain within that box, if you will.
Paolo Platter:
Maybe another question––an internal one. How [is] the budget working in this data mesh initiative? [Is] IT funding the data products implementation, or is it more on the business side? So without entering too much detail, how [is] the budget working there?
August Johnson:
I was about to say you'd have to ask my manager but––
Nik Acheson:
I was going to say budget approval!
August Johnson:
––he's the one to answer that. But essentially what we did was that IT was to fund the first year, and thereafter, over time, we'd put over some of the costs, and so on, on the domains in terms of Dremio, Whitboost, and then, of course…
Nik Acheson:
We went pretty well there. My history is always [that] you have to fund 2 years to drive full adoption.
August Johnson:
Yeah, yeah.
Nik Acheson:
Yeah.
Paolo Platter:
So the strategy is to inter-centralize funding to reach the flywheel effect of the platform, and then let the business go.
August Johnson:
The carrot and the stick, so to speak.
Nik Acheson:
Cool. I know we only have a couple of more minutes, maybe August, we've asked a whole lot of questions. I do want to make sure you have some space too. Are there any things that, in terms of what we're seeing, trends or otherwise, that we can help you on in our journey? Obviously, you have access to us anytime, but [is there] something you'd want to highlight or dive into with us?
August Johnson:
I don't know, one thing I was curious about, Nik, because you mentioned you'd help other companies adopt the data mesh paradigm, and based on that, what would you say were the key differences between what you've done with them versus our approach, and if there are any learnings [we] at Scania could take from the work you've done?
Nik Acheson:
Yeah, I think the one thing that I was trying to pull out a little bit earlier is that business case development. I have a fun story I've told a bunch of times where––I won't get into the full story, we don't have time––but the TLDR of it is like, business users just look to me, and say I don't care. Whether you're centralizing, you're decentralizing, whether it's mesh, whether it's Waterfall, Agile, you can just use all these words all day long, I need to grow my business. So how are you gonna help me do that? And I think that's always a massively hard problem that I think us geeks get pulled into where I always try to advise folks is [that] the technology, the data, all that stuff is tertiary. I even had a leader once tell me: I see the data, I get it, It's probably a bad idea, but I don't like how the data makes me feel, so we're gonna go do this. Like, even almost stop showing the data. So it's bringing them along that journey. But the second part of that is then, with that partnership, building that business case, where I was working with one team in the UK a few months ago, where they're all in on the platform, and they see the impact, and they're like, I just can't get the business to to come with me. And I'm like [that’s] because you're talking in your language. I was like, I can guarantee you in the business that you're in you probably had some material breach or otherwise, because I've read about it, but imagine if you had these patterns in place before. What would the impact be [on] your response, that acquisition that was just made with another one of our prospects? How does this pattern help you immediately, day one, say, what's our total sales across both companies? What's the performance of our product across both companies? So you've got to speak in a way that allows them to massively lower that barrier, in a most ideal world, and have metrics tied to it, but that's the language you have to speak. And I think that's the only one. It's cool that you had that alignment, from the top so you could go down and and drive and start building. But I'd say, that my biggest lesson learned is always to start business-back.
August Johnson:
Cool.
Nik Acheson:
Paolo, I don't know how your experience…you've been with a bunch of customers now, too, especially building a startup and trying to justify trying these new technologies and patterns.
Paolo Platter:
No, no, I totally agree, and I followed the Scania strategy and adoption for a long time, so I’m pretty aligned with August’s journey. We have another question, I don't know if we have time, from Gregor, regarding the budget question, are you planning to charge data product consumers, or the data product will have its pricing in the future? Are you planning for that?
August Johnson:
Yeah, so we're adding on the ability for domains to add a bit of a markup on the consumption if they want to, so that's one thing. And then us ourselves, there will be certain costs tied to using the services, but it's done in a way that doesn't make it too costly, if you will.
Nik Acheson:
So nice.
August Johnson:
––That's the essence. What was that?
Nik Acheson:
––That's a very nice chargeback model.
August Johnson:
Exactly.
Nik Acheson:
Yeah, I think the funky one is being able to not, to your point, not charge back onto things that are materially going to move the needle for the business. And I think that's the hard part, and I think that's the smart thing you guys are doing with that one year. I probably guess you might move to two at some point, but I hope you get one! But being able to go like, your consumers will tell you the value of the platform––usage and so forth, and I think that's the part we're using that data to then continue to mature in your journey and find this pivot points to say, hey, I think we might need more data. Or I think we might want to drive some adoption in these areas. And to the point, some areas you might not charge back on to drive more maturity around the business and impact.
August Johnson:
Yeah.
Paolo Platter:
Also, according to Pino's principle, a good intermediate step would be the show-back approach, showing back the cost of educating people that data is costing someone, because right now, it is a cost that is completely hidden in IT, so educating people, doing show-back, and then moving towards a chargeback model.
Nik Acheson:
Well, especially being shown what it would have cost. Like, don't go back to the old pattern because we saved you 80%, so.
August Johnson:
We have gotten a bit of say, revenge lobbying from other domains that have adopted certain platforms. Thinking of the potential savings you can do with adopting Dremio as producer and so on, like, when we've done the comparisons between alternatives. I'll keep it at that. But there has been, yeah.
Closing
Alex Merced:
Hey guys, I just wanted to jump in and say, this has been a fantastic conversation. We're getting near the end of the time, but I just want to give you guys a huge thank you for having this conversation this week on Gnarly Data Waves. And for everyone who's listening, you can re-watch this episode on Spotify, on YouTube, as they'll get posted in the next 24 to 48 hours. I encourage you all to add August, Paolo, and Nik on LinkedIn, so you can follow the things they're doing, ask them any further questions, and see even more of the exciting things that come. But again, just one more thank you to Nik, August, and Paolo for being here today, and yeah, this is gonna wrap up the episode, so make sure you voted in the poll that popped up. And again, thank you guys so much for being here this week on Gnarly Data Waves. Stay tuned, there'll be more episodes of Gnarly Data Waves covering all sorts of great Data Lakehouse topics, [and] bring them directly to you.
Paolo Platter:
Thank you so much.
August Johnson:
Thanks so much.
Nik Acheson:
Yeah, thank you guys, bye.