May 2, 2024
From Zero Users to One Million Insights a Day in Six Months for Maersk with Dremio
Maersk is a global leader in container shipping, logistics, and energy. With an extensive network of offices in 116 countries, over 900 vessels, hundreds of warehouses, and a modern fleet of aircraft. Learn how Maersk has built their next generation data platform for unified analytics on Dremio. Learn how Maersk made Dremio the go-to place for end-user SQL queries in only six months. We will cover common data platform challenges in the shipping and logistics industry and learn about key use cases that Maersk is delivering to empower their developers and end users to deliver agile and cost-effective solutions.
Topics Covered
Sign up to watch all Subsurface 2024 sessions
Transcript
Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.
Mark Sear:
Okay, that’s cool. So this is an odd first slide, but I only discovered earlier on today that one of the ladies at the desk, it’s her 30th birthday today, so if you get a chance, just drop around and wish her a happy birthday because actually, I was going to do a raffle to see who you thought was 30, but no, that’s not the comedy part. Actually I shouldn’t be saying that, but anyway, I’ll cover a bit more on that later because it’s quite important. This is me. My name is Mark Sear. I work for a company called Maersk. Most people think that Maersk is what you’ll see on the next slide, which is just trucking and things like that and containers, but we’re a lot more than that. I’ll come on to that in a second. A little bit about me. My hobbies are surviving, really, hence the picture. I like living on mountains and just generally living in the countryside for something to do. My ambition is to retire to Italy over the next couple of years because I think Italy is probably the finest country in the world. Finest food, craziest women, and pretty nice place to go mountain biking. Maersk, yeah, this is what people think we are. We’re way more than this. If you see these containers, and those of you that are here tomorrow, I’ll explain a bit more about them. This is probably what most of you have seen. If you’ve ever seen any programs about drug importation or anything, for sure you’ve seen them on there because they’re in every port, and for some unknown reason they always focus in on Maersk when they’re talking about importation of stuff. We do way more than that. We do trucks. We do warehousing. We’ve even got our own cargo airline, which is rather nice.
The Journey
Let’s talk about the journey of how we went from literally zero users to at the moment running about 1.6 million queries a day through our Dremio system. That’s where Dremio is. That’s where our headquarters are. I think that’s where they are, in Denmark. Anyway, that’s where it says it is. It cost a lot more than $1,347 and a lot longer than 13 hours to get us there on the journey, but I’ll take you through what we did, what went right, what went wrong, and then I’ll answer any questions completely openly, and hopefully whoever’s doing this online can block the Maersk people out when we do the honest answers, but maybe not.
How did we start? How did we get going? How did we implement? How big was the product? Really simple. We had a small team. I’m not going to lie about it. We did not have a big team doing this. In my team at the moment, we’ve got 10 people. They’re completely amazing. You figure that out of those 10 people, we’ve got only two people dedicated to Dremio, and they’re not even fully dedicated to Dremio, but they’re supporting an awful lot of people. For me, that’s kind of impressive because one of the things that we’re looking for in Maersk, particularly, is low total cost of ownership, of course, given the state of the economic cycle, given the state of where the world is, but we want, above all, to bring value to the data, and if you’ve got to start putting 150 people on that, it becomes a bit of a punchy experience. We didn’t go into this with no other infrastructure, and we still have other data infrastructure, but that’s the size of our team, and I’m pretty proud of that, actually, as an achievement. Really nice set of people, and we’ll come back to that later.
What did we do? Well, we started work, and we built something. We literally said, “What’s the best way to deal with it? Let’s start constructing,” and I think it would be fair to say that the first few weeks was kind of like build it, and they absolutely won’t come. They will sit and do something else, anything in preference to using the product, and why was that? Well, right now, at this time of the life cycle, we didn’t know. We didn’t really have any idea, because it looked to perform well. It looked like it had all the characteristics of everything that should be successful. Really nice architecture, nice design, scalable, et cetera, et cetera, but still, there were no queries going through it, literally, on some days, literally none.
Six months in, Stardate, six months in, bleak prospects, basically, still not getting that adoption. It was really kind of sad, in a strange way, and so you end up in a situation where you’re beginning to think, “What have we done wrong?” We’ve got plenty of data in there. You can run queries. They run fast. They’re kind of cost effective, and amongst all of that, we’d literally followed all the instructions, everything that should be done, what Dremio told us, what all the books said was the best thing to do, best practice, absolutely everything. That girl actually freaks me out. It looks just like my daughter, and that is the sort of thing she would do when she was younger. She’s 30 now, but like that. I guess it was kind of almost like Murphy’s Law, right? What can go wrong would go wrong, and we were sitting there wondering to ourselves, “What is it we can do to change the dynamic of what we’re doing?” We start to have meetings, and we meet with ourselves, first of all, because, of course, like all technical people, we want to see what we think about it. The business can … They’re obviously not smart enough or whatever to use the product. It has to be something wrong, technically, and I’m not the deepest techie at all, the deepest technical people on my team. They are so super, super technical. It’s quite incredible. I won’t mention Graham’s name, but he’s really technical, but he couldn’t get in and out of that door because he couldn’t work out if it was a push or pull. He’d have to take it off the hinges. He’s that genius guy, but you understand that, and the sort of meetings we were having disappeared down those type of rabbit warrens, really.
What do we have to do to try and make things successful? We had to do this. We had to almost, quite literally, start from scratch. We had built ourselves an infrastructure that worked, but wasn’t totally scalable. Above everything else, it wasn’t reliable. I think the rules of engagement with BI, with queries, with AI, everything, nothing has changed ever in computing. People want reliability. If you don’t get reliability, it doesn’t matter what all your extra features are. That isn’t a misspelling, incidentally. I used user deliberately. There was no plural at that stage because people switch off very, very quickly if things don’t work in the way that they want and, in fact, the way that they should demand in 2023 when we were doing this.
The Big Picture: Enterprise and Operational
What did we end up switching to? This is the big picture, and I’ll go a little bit deeper into this. Not too deep because I recognize a whole bunch of people here don’t want to know all that stuff anyway. Big picture, sources, some external, some internal, classic across all of our businesses, warehouses, aircraft, ships, trucks, internal systems, ports. We own a whole bunch of ports globally and operate those ports. A pretty massive business with, oddly, not huge amounts of data but extremely complex data. If you imagine, obviously, a shipping container is a pretty big thing, 40-foot or 20-foot shipping container, but you look at what’s inside those containers, and not every container is filled with goods from one company, so a container can be split. You might, for example, have a container that’s got four pallets from customer A, three from customer B, et cetera, et cetera, to fill out that whole thing. It’s really complex. Then you’ve got all the documentation that goes with it, customs documentation, import documentation, et cetera, et cetera, really, really complex data. We’re taking all of that, putting it into some data stores, operational reporting, layering the query engine across that, and of course, we’ve got our own data lake, which is called Maestro, which I believe is a conductor for an orchestra or something. I’m not really au fait with classical music, so forgive me on that one, but that’s what I believe it is. We do all of our enterprise reporting there via what we call the group data and analytics guys at GDA, fantastic guys, based largely in Bangalore, which this week has hit temperatures of 42 centigrade, extraordinarily hot.
Then clients sitting on top of that, the processing guy, the query engine, AIML, dashboards, reports and apps. If we go down a bit deeper, what’s it look like in there? Make no bones about it, we stole most of this from Dremio, this slide. You can see we’ve got the classic three layers, bronze, silver, gold, that push things out into different dashboards. Basically our dashboarding now, we’re trying to move towards Superset, because to be blunt, it’s really performant. It’s really cost-effective. Power BI is a fantastic tool, but you have to pay the Microsoft tariff, so if we could, we would reduce our costs on that. I would say this is a relatively simple, relatively, shall we call it, classic implementation of the product. Scales supremely well. If we go down to the next layer, you start to see how we organized it internally, and there’s a whole bunch of stuff on there that we can work through, but I won’t bother. The bottom line is we’re running within Kubernetes, so we can scale up, scale down, make sure we keep our costs right, et cetera.
Does it work? We have to say that yes, it works actually extremely well. If I push across here to our daily usage stats, we’ve got 3,000 direct users, and what do I classify as a direct user? A direct user is somebody who either sits there and types a little bit of SQL, or they run a report that runs itself directly against SQL, 3,000 of those guys. We’ve got 5,000 downstream users, or we’ll have in the next couple of weeks, 5,000 downstream users, and those are people that use Power BI, for example, utilizing the Dremio feature that allows you to use Power BI, and it issues queries in the background to bring your data back, so you don’t have to use Azure analytic services, which potentially costs a few dollars to use that product. From a total cost of ownership and cost-saving perspective, it’s a really neat feature that we’ve used to get rid of most of our AAS product.
Daily Usage Stats
We think this will rise to somewhere in the region of 25,000 throughout 2024. That’s daily users, and one of the reasons for that is we’re moving a whole bunch of SAP reporting now into this environment. SAP are deprecating one of their products, and we’re going to replace it with Dremio and an interface, and that will bring a vast number of users. Now, these people aren’t typically running dozens of reports a day. They’re running maybe one or two simple reports, what’s happened on my cost center, what’s not happened on my cost center, et cetera, so simple reports, but a heck of a lot of them, and we find that Dremio is really, really good at this sort of mix of workloads. Nobody waits very long for a query. Nobody waits too long. If you’re interactive, you get your answers back very quickly, and it’s a sweet mechanism for us. So as I say, we’re up to about 1.6 million reports now. That will rise very significantly as the SAP users come on board, but this is one of the key things. Our downtime is now 0.03%, so about three hours in the last 10 months, 99.97% uptime. Now, if you contrast that with what we had with version one before we had to blow it up and start again, we were getting really several hours of downtime every week.
So here’s one of the big gotchas that we found, which is if you let people go onto your dev environment and they start running all of their workloads, their production workloads, guess what? They never want to move to production because it becomes difficult for them to do so. So one of the reasons we’re looking at moving potentially to the cloud environment is it will make that transition easier and smoother, so they can move from dev to test to prod and not get stuck in one place. So that’s a quite interesting lesson for us there and something that I would just advise you to look out for. So we’ve had the chaos. We’ve moved quite nicely, I think, from that chaos to a really nice production state. As I say, we’ve still only got two people on this, I would say semi-full-time, which is pretty amazing if you consider what’s going through this. Certainly we would have more people maybe on Oracle or something else there.
Do I perceive it as a risk? Well, one of them is on holiday this week, so we’ve got one person. So it is a bit of a risk at this moment in time, but overall, not too much of a risk at this stage of the game. Why? Because we’ve still got other mechanisms for dealing with stuff if it were to go down, but so far it hasn’t, and I’m checking my phone, it still hasn’t, so it’s great. We’re going to get to the end of this without it going down. So what’s the situation like now? You’ve seen it’s been a bit chaotic. We’ve built out this environment that can cope and scale and work very, very well and achieve what we want in terms of cost-effectiveness. Of course, we are always looking to save more money. Why not? But we’ve avoided being locked into paying to use our own data, and to me that’s critical, which is why we’ve standardized with Dremio as far as we possibly can on Iceberg. There are other data formats out there, but they typically lock you into paying to access your own data. Even if it’s an open format, if you want to get access to a lot of the core features, like partitioning, et cetera, you can only do that by using a proprietary tool, which you then have to pay for, and for me, if I have to pay, it’s just my personal opinion, not Maersk, just my opinion, if you have to pay to access your own data, I kind of really wonder about that. It just doesn’t make a lot of sense to me, to be honest.
Where Are We Now?
So where are we now? We’re in the Zen Buddhist phase of this, and this is no doubt in two minutes’ time my phone will ring, there’ll be a problem, but it is actually quite a relaxed place to be, because I know that the team that I’ve got, and I’m very lucky, the team that I’ve got are super smart people, super dedicated people, and if any of them are watching this, then I extend my massive thanks to you guys, because you’re the guys that made this happen, and that’s really, really important, but now we feel quite relaxed about it.
Well, what changed? Was it just the tech, was it just the people, really what changed? I think this is one of the things that changed. You’ve got to be prepared to communicate, as it says here, even when it’s uncomfortable or uneasy, one of the best ways to heal is simply getting everything out. You’ve got to be prepared to listen to your users and say, “Tell me why you’re not adopting this incredible technology,” and sometimes it can be something simple where somebody just says, “Well, they haven’t got a driver for X, Y, Z.” “Oh, well, we never knew that,” “Well, you never asked,” “But you never told us,” “Well, why are we supposed to tell you what you should know?” It’s about that communication thing, and I think that’s the single biggest, I guess it’s one of the single biggest things in life, really, is communication. As I said at the beginning, to go from zero users to over a million queries a day, it’s not rocket science, communication, giving people what they want.
Product Architecture
It’s also about the product and architecture, of course it is. Our second shot of the architecture was way robuster than the first shot. The first one was kind of a, I guess we went successful proof-of-concept production and figured that we didn’t have to do anything on the way. My advice to you, if you’re in the proof-of-concept phase at this moment in time, is when you go to production, be prepared to tear down, throw away everything, start again, build from fresh, and get Dremio to do as much as you can for you because then you’ve got, it’s not a phrase I like, but you have a throat to choke if it goes wrong. If you’re doing it yourself, it’s always difficult to blame somebody else.
Build the right products on top of it, but the most important thing for me, again, I’ll come back to it here, is people, and I put this young lady up again because imagine the dedication of somebody, it’s her 30th birthday, pretty big birthday, I think we could agree, but she’s here dealing with people like me on a day when she could be doing something else but she didn’t want to let her boss down, her colleagues down. That’s a sort of uncommon dedication that I think you need to build in a team in order to truly achieve success because you’ve got super smart people, super clever people, they are worth less if they are unreliable than someone who isn’t quite as smart, but they’re super reliable and super dedicated, and again, as I said, my team, I’m so lucky there and so lucky actually with many of the people, pretty much everybody that I work with at Maersk. As a company, we have a really good ethos and it works really well together.
Final slide is this really, I’ve just told you, none of these things are stunningly unique. Those of us over the age of 30, I myself am mid-30s now, maybe in my head, we all know this, right? Do the common thing but do it uncommonly well. In our organization, we say do less but go all the way, literally deliver on every promise that you are making, go out of the way to get to the real ending of it, and that would be my singular message for me, how to go, I told you I’d tell you how to go from one user or zero users to a million queries a day, that’s how you do it. Do the common thing but do it uncommonly well and surround yourself with people that are smarter than you are. If you do that, you too can have a million queries and be Zen-like in your ability to do that.