May 3, 2024

AI and Insight Foundations: Data Access and Quality Deliver Innovation

Rahim Bhojani, SVP of Engineering, from Dremio and Mark Sear, Director of AI, Data and Integration from Maersk will discuss the impact of data access and quality on innovation for data scientists and analysts. They’ll highlight the challenges in AI development and how Dremio’s lakehouse, alongside a vibrant ecosystem of AI and analytics tools, enhances data utilization. Mark will offer insights into specific tools they’ve built as well as ecosystem tools they leverage to drive efficiency and innovation across a broad spectrum of data consumers.

Topics Covered

AI & Data Science


Rahim Bhojani:

Good morning, or to borrow from her, good welcome from live here in New York. I will talk to you about AI insights and what it will do to the analytics industry. And then we’ll also talk to Mr. Mark Sear, who I will introduce in a second. We’ll have a little bit of a fireside chat, and we’ll wrap up. So let’s talk about what we’re going to discuss now. Why is AI important? What will that do to accessing, finding, querying, validating your data? Where is Dremio on that journey today? What are our data foundations? Where are we going? Where are we going with that? 

Generative AI

So let’s dive in. McKinsey recently came out with a report that talked about generative AI and introduced about $7.9 trillion to the global economy. There’s no denying AI is one of the most disruptive things that’s happened this decade. It is going to be a tremendous productivity enabler. As an example, my dev teams today use a copilot experience within their IDEs to speed up their menial tasks. It has become part of their workflow. What does this mean for you as an analyst, as a data practitioner? Let’s talk about that. Recently, about two weeks ago, I was in Europe. In five days, I met with 11 customers. Almost all of them said the same thing. Time to insight is still too slow. If you’re in a decentralized organization, then getting reports could be faster, but that leads to governance, data copy, data proliferation issues. If you’re in a centralized organization, then you’re caught in this data do loop. At the end, you end up with taking up to six weeks, up to a couple months to get a standard report, maybe up to three months to getting data ready to serve AI models in production. 

How Does Dremio Help You?

How does Dremio help you? First, let’s talk about a few customer examples. We have many, many customers who, as we survey, as they implement Dremio, they end up doing their data and AI projects five to 10 times faster. As an example, Amazon’s finance analytics team used Dremio to implement their projects 10 times faster than some other software that they tried. S&P Global, you heard from Tian yesterday in the keynote, and Data, they used Dremio to speed up their projects seven times. This is now a race, and everybody has to be part of it. How does Dremio help? There are three themes as part of our strategy and vision. First theme is data foundation, things that are already available today. The second theme is gen AI experience, meaning we’re building something like a co-pilot, hint, hint, within the Dremio’s interface. Finally, AI applications, really doubling down on making public and private models all the capabilities available within the Dremio and partner ecosystem. 

Let’s start first by framing some of the current challenges one more time. As we survey our customers, we hear from them, 80% of the time is spent finding the right data set, creating the right models. What would a typical workflow look like? You have your business analyst, they want something done, they log a ticket, which goes to a data engineer. That data engineer will take about a week, sometimes more, to create that model. Then they’ll produce a report. That could take up to a month. That goes back to that business analyst, who then gains some insight, but then has more questions. Guess what? They have to log another ticket. It goes back to the data engineer, that person may need to add another table, add a column, add a calculation, and you’re caught in this constant loop. This actually reminds me of being an intern 20 years ago. I worked as a crystal reports developer. Sometimes I wonder if things have really changed. Now the self-service analytics changes that’s happened with client tools like Tableau and Power BI help alleviate some of this. As we discussed earlier, that leads to data proliferation, copying, governance problems. 

Data Foundations With Dremio

At Dremio, we want to give you the best of both worlds. Here’s how. We will talk about how you find data, how you access it, how you query it, how you validate it within the Dremio platform. You saw some of those demos yesterday. Let’s talk about how we do this. Dremio has a very rich ecosystem of sources that we support, from data lakes to databases, be it on-prem or in the cloud. The key that powers this is our domain-specific semantic layer. You as a consumer, be it centrally or decentralized team, can produce these data products in one platform without any proliferation and use the myriad of client tools to then do self-service all in one place. This eliminates data silos and helps your user base solve their problems themselves. Gen AI only grows this pie of data, and I will talk about how. NLP also grows the diversity of the user that comes to the platform. You don’t need to know SQL anymore. 

Second, we talk about finding data. In Dremio today, we have a keyword search interface already built in. You have the ability to add description and labels within a data set. This attaches semantic meaning. And then finally, lineage. You always want to know who did what, where did the data come from. Imagine being the new person at your job, and you don’t know how to get started. These capabilities help you get started faster. Now, querying data is key. Sure, you can find it, but you want fast lightning speed, which is what Dremio is known for. In our rich SQL editor, you get auto-complete. And if you don’t want to use the UI, you can use our REST interface to build applications on top. 

And finally, let’s talk about validating data. You heard from our awesome developer advocate, Alex, yesterday. He demoed Git for Data. In this example, you have a main branch. In the past, you would have to copy that, do your ingest, and your experimentation. Somebody would have to come in and say, I authorized this. I audited this data. This is good. And you merge it back. With Git for Data, you simply create a branch. You do your experimentation in isolation. When you’re ready, you merge it back. We take care of all the hard work. And guess what? All this is automatable. 

Where We’re Going Next

Now, let’s talk about where we’re going next. Gen AI experience is built right into the platform to give you that much more productivity. And we’ll talk about it in the same theme– accessing, finding, querying, and validating data. Some of these things are already available. For example, creating wikis and labels automatically. We have done LLM integration. And our system can scan your schema and any existing wikis and labels, and then produce descriptions for tables and views, for columns and labels. Why is this important? Where is this leading to? When you have the semantic information, you can then have an AI-assisted data discovery process, something like semantic search. In my prior life, one of my founders at Tableau used to say, the human brain is not designed to think tables, but columns. That is really applicable here. If you want to look for summer sales, or you want to filter by the West region, the system should surface what’s relevant or not. And that’s what we’re working on. 

Additionally, if you double-click into that data set or column, you get automatic insight of what’s there. We also launched Text-to-SQL about a year ago. And we’ve been iterating on it as we get feedback from our users. This capability, if you could write natural language, we produce the SQL for you. Be it joins, or just finding views, or looking at columns. It’s available in multiple languages, and you don’t need to know SQL. This is really, really cool, because it changes the diversity of users coming to our system. You do not need to know SQL. 

LLM Ecosystem Vision

Now let’s focus on what our longer-term vision is for AI, AI applications. What does this mean? To us, in our vision statement, we believe we need to serve our customers where they are, be it in the cloud, be it on-prem. We want to respect your privacy, meaning AI on your terms. Let me share what some of these capabilities could look like. IDC estimates that 80% of data in an enterprise is unstructured, something like that. You need a lot of specialized skill to extract insight out of this. Can you imagine a world where you write a simple AI extract function right within the Dremio interface, and you get the columns and rows that you need? This is very close.

Additionally, think about other AI functions. In another previous life, I used to do sentiment analysis by extracting information in the artist formerly known as Twitter. We would set up a Python server, load the right libraries, write the right scripts. It would be a 10-step process to get the result. Now, again, write a SQL function, feed it the data, you get the result right there. Similarly, something like masking. Today you have to write a Python or Java UDF, iterate through the data, and mask what you don’t want to show. Simple function right there at your fingertips. Stay tuned on when all these capabilities will be available. But let’s now transition to one of my favorite Dremio champions. He is the head of AI data and analytics at Maersk, Mr. Mark Sear.

Mark Sear:

Hi, well, thank you. I’ll start by saying what a pleasure it is to be here in a bunker somewhere in the middle of New York, transmitting globally to literally several people, and I do hope my mom is watching. If she isn’t, Yahoo sucks. You go to a different home when I get back.

Okay. So we’re going to talk here about how we went from zero users to a million insights a day in six months within Maersk, and I’m going to start off by talking about me for just a couple of seconds. This is my favorite subject. What am I? My name is Mark, obviously. I’m an AI and data enthusiast. I just happen to be super lucky to work in the field of data and AI. I’ve had a long career, and I’ve loved every minute of it, and I just hope everybody who’s watching this, genuinely, I hope you have such fun in your career. I love cycling, I love wild camping, and I am, for my sins, a Chelsea fan. Those of you in the UK, you can start in the comments now about how terrible our season is, but we’re on our way back for sure. 

What Does Maersk Do?

So what does Maersk do? It’s a name. I’m sure most of you have got it in your mind. You’re thinking, Maersk, Maersk, I’ve heard that somewhere. I’ve seen it somewhere. This is probably what you think Maersk does. You think that we own big metal containers and move them around the world, and to a certain extent, you’re absolutely right. Well, what Maersk actually is, is we’re a company that does end-to-end logistics, everything to do with logistics. So sure, we’re metal boxes, containers. We also have aircraft, we have trucks, we own ports, and we’ve got warehouses. And additionally, we’ve got, of course, 700, notice the word here, vessels. They’re not ships, they’re not boats, they are vessels. So it’s a very different company than you might think. 

Now, some of these vessels are absolutely huge, and I’ve just put this in just for fun. The largest vessels are 18,000 TEUs, that’s 20-foot equivalent units, each box being 20-foot long. Question for you, have a quick think about it, you’ve got a few seconds here. How many ping pong balls could one of those boats move if they move? I’ve said boats. I’m going to be crucified for that later, but don’t worry about it. Why did I choose ping pong balls? Just because it just seemed an interesting thing for me to calculate. There are 8 billion people in the world, approximately. One of those vessels in one journey could move 18 billion ping pong balls. If you don’t believe me, that’s the calculation there. It works. The numbers are accurate. 

Why Data is so Important

So why is data so important for Maersk? Well, it really does come down to that, our global scale and complexity. We have 100,000 employees, as I said, 700 vessels deployed, 65 terminals in 36 countries. We operate in 130 countries. Very few companies have got that global reach in any business. We do. We have 100,000 plus customers, 3 billion business events on, 7 million square feet of warehousing, and in addition, and we’re very proud of this, we are moving our business to be net zero climate neutral by 2040. What that all adds up to is data is vital. It’s vital for us. It’s vital for our customers, and it’s not just us, of course. It’s the world when it comes to achieving that net zero aim. 

Transforming Data at Maersk

So what is Maersk moving to? How has this all happened? Our business, in common with many others, is transforming. If you went back a few years, we were literally vessels. We were — that’s what we did, that and some terminals. Now as you can see, we’re talking about integrating end-to-end supply chains. That means doing everything from cradle to grave, if you like that phrase, or soup to nuts, whichever phrase you want to use, for data. It’s about integrating those supply chains. Six million container moves will generate hundreds of millions of logistics events. What’s a logistics event? It could be as simple as moving a container from the factory gate to the gate of the port. But what it’s about is informing our customers what is happening with their supply chain. It’s about that really tight, embedded, integrated supply chain partnership. 

We need an intelligent data ecosystem to complement our intelligent connected ecosystem. IoT data from ships. Think about what would have happened 25 years ago, somebody would have said via a telephone call probably, yep, your container is on its way from China to the UK. Next thing you would know would be nothing for weeks and weeks and weeks. Now we’re in a position where shortly we will be able to inform customers where their goods are constantly. It’s an amazing transformation for them, and it’s an amazing transformation for us. 

So we’re doing that. Also, what do we use data for? We use it to supply the three most important things that a CEO wants. First one, pretty simple. Make more money. Help me make more money. Help me bring value to our shareholders and to our customers, and data is absolutely critical there. The second thing they want to do, CEO, and in this case, a CFO as well, is save me money. Help me reduce my costs, and this third one will definitely resonate if you work in banking and areas like that. The third thing all CEOs want to know is keep me out of jail, okay? Make sure my compliance works. Make sure my numbers add up. All those types of things, and data is clearly critical in all of those areas. 

What is Dremio Powering

So Dremio is rapidly becoming the data platform on which we can build both traditional business products and also new, innovative solutions that really do power the next generation of supply chains, supply chains that will be powered themselves by integrated logistics. Sort of things we’re doing. Operational reporting. At the moment, about 500,000 operational reports a day. That is rising, and then every so and again, it drops because we have guys that go in and optimize and make sure that things are working properly, deprecate reports that aren’t useful anymore, et cetera, but call that steady state at this moment in time. We’re using it to sit behind Power BI and drive Power BI queries, serve those queries up. Saving us money. That’s playing into the saving money aspect that our accountants love. We’re also starting to see people now power machine learning models with well-curated, easy-to-get-hold-of data from Dremio, and a bit later, I’m just going to talk about something which is just very different. Everybody’s got accounts. Everybody’s got other types of reports, but I’m just going to talk about something called the minimum viable company. 

Does it mean that Dremio is perfect? I’m going to tell you now. No, it isn’t, but we’re using it as a base platform on top of which we can build our own tools and products, and then as they come out with a product, maybe it overlaps, we can choose to use that product or not use that product on an a la carte basis. So for example, Rahim talked about Gen AI to generate SQL. Well, actual fact, we built our own. We don’t use the Dremio product right now. We built our own. It suited our needs, but if we want to, the nature of Dremio and the nature of what we’ve built is such we can just pivot, switch when it suits us and when it suits the other people in our organization, primarily, of course, the business. The key to us is platform flexibility and as close to zero lock-in as possible, really. Same thing with Iceberg. As far as data goes, we definitely like to keep it open. 

Right now, as I said, we are using it for powering business ideas. We collaborate very closely with our colleagues in what we call GDA, Global Data and Analytics Team, and of course, development teams globally. New solutions. We’re using it to save money. I just spoke about that. Just driving down the cost of delivery of the data that we do have. Is it perfect? Nope. Already said that. There are challenges. We’ve gone from 20 users at the end of April 2023, end of April 2024, a few days ago, I checked. We have 3,000 users. By the end of May 2025, we expect 8,000 users and what we call downstream users. That’s to say people that are using products, not direct query access. Things built on top of the Dremio data access, that semantic layer that’s so useful for us. We expect tens of thousands of those users to be doing that. 

What do we want to do? We want to give people access to the data so that they can create the value. My team can’t do that. I have a very tiny team. They’re a very wonderful team. Big shout out to all the guys in my team. You’re probably not watching, but I’ll cut this out and send it to you as a clip anyway. Big shout out to you guys. You’re doing an amazing job and there are so few of you, but you rock. Absolutely rock. 

Sample Use Case

I’ll give you a quick sample use case, which is supposing there was a cyber attack and pretty much all of our systems went down. How would we be able to service our customers? Well, what we’ve built in Dremio is what we call a minimum viable company. I’m not going to go into too many details because clearly some of this is proprietary, but it shows you the flexibility of a good data platform. We basically sent a couple of my guys out, business requirements gathering, working out what source systems, what key fields, what key pieces of data really were needed to keep the goods flowing in the event of a critical business failure. They then mapped those to the data lake and they created what we call ontology data sets or golden data sets, 48 of those on top of which the business can operate in the event of some major problem occurring. It’s an absolutely fabulous illustration of how a flexible data platform can be used to do many things that you never ever thought that you would want to do with it or even could do with it. It’s really quite an amazing thing. I will do a shout. I’m going to mention two names here. Julian and Graham, in particular, you did that. Peter, well, we know your idea was using Dremio in the first place, so again, those three guys, they’re the guys that have done that. I do nothing. I always say to my guys that this is the sort of leader I am. I’m a caring guy. If this works, I will be super successful and people will laud at me. If it fails, you are taking the blame for this. That’s the sort of guy I am. 

What is the value of the minimum viable company? Well, we love our customers. We really do. These are the people that provide everything for us, from jeans to this shirt, whatever it is I’m wearing here, to microphones, everything. We love our customers. Without our customers, we couldn’t carry out our lives. The value is kind of incalculable from that point of view, right? The customers entrust their businesses to us. It’s on us to make sure that we serve them. That’s the sort of value that we’re trying to drive. 

Lessons Learned

Lessons that we’ve learned on the way. This is always a difficult one because it always sounds a little bit twee, you can say, but I’m going to tell you the truth here. There’s nothing new in this area. I think these rules have been the same from when I started in tech many, many years ago with 16K of memory on a mainframe. Younger people won’t even know what that is, 16K. You’re probably sitting there saying, “He means gigabytes.” No, I mean K. That’s what it used to be like. You’ve got to communicate. You’ve got to work with people. You’ve also got to be patient. Change doesn’t happen overnight. Just like Rahim was saying, AI is going to be the biggest thing in the world ever, in my opinion, but it’s not going to happen in two weeks. It might take two years. It might even take three years, but it’s coming. Nothing is new there. 

Business-wise, more of the same. We’re going to keep delivering quality data at the right time so that other people are empowered. We want to deliver a rock-solid platform that supports the mission of other techies and, of course, the business to deliver those products. Personally, who knows what happens next? I was asked to answer what happens next. I thought I’d answer it personally as well as from a business perspective. I guess I’m a dreamer. I’ve always been a dreamer. I’ve lived my life as a bit of a dreamer, and there’s a little bit of madness thrown in there as well. I’m going to stick to that. If you’re young and you’re watching this, a great guy once said, “You’re only given a little bit of spark of madness. You mustn’t lose it.” That’s my advice to you. Don’t lose it. We are now going to carry out some hardcore logistics, quite literally moving chairs. Actually, this probably isn’t logistics. It’s more furniture removals. Rahim is back.

Rahim Bhojani:

Yes, we’ll have a quick chat. I mean, how could I not ask the logistics guy to do just a little bit of logistics?

Mark Sear:

I think you saw my expertise there.

Rahim Bhojani:

All right. I’ve got some questions to ask you, but I don’t like any of these, so we’re going to ad-lib a little bit.

Mark Sear:

You know I rehearsed those for two weeks.

How Did You Get Here?

Rahim Bhojani:

That’s okay. You’re really natural. We’ll be okay. All right. So, like myself, I’m sure a lot of the audience really connected with you right now, inspired by your background. First question, how did you get here?

Mark Sear:

How did I get here? I think the first and most important thing is I was born lucky. I’ve always believed that. My son, believe it or not, is called Lucky. That’s his name. I think luck takes you a long way. That and having one or two truly memorable and inspiring bosses. People that trusted me, would let me go for it, and would let that spark of insanity turn into something wonderful, or if it didn’t, would at least back me for long enough for me to know it was wrong.

Rahim Bhojani:

Yeah. One of my bosses used to say, “Have no boundaries.” I think that’s really relevant. In that theme, what are some things that people are really not thinking about that they should be right now, as it pertains to analytics and data?

What Are Individuals Missing with Analytics and Data?

Mark Sear:

I think there are things that people are thinking too much about, and things that they’re not thinking enough about. I think, for example, if you take Gen AI, people are model obsessed. It’s this model versus that model. It’s this model. The thing is, if you get obsessed with that, then what happens is a model will change. I, like you, subscribe to numerous emails, and every day there’s a different best model that’s telling me. If I spent my life just plugging and unplugging those models, I’d never get any further. I think what people are doing is what techies have always done, which is focus on the tech and not on the people side of things. Focus on the business side of things. That’s what I hope people don’t do, but I can see people beginning to do it already. Focusing on the tech, not on the people.

Culture Change for a Large Organization

Rahim Bhojani:

In a lot of ways, what you describe is a real culture change. Given your history, given your journey, how do you impart that in a large organization like Maersk?

Mark Sear:

I think the only way you can impart that is to impart it to your local group of people and hope that your team can then become evangelists for that opinion, for that approach, and go out and find other people if you like. I suppose you could say the Mormons got it right. They go and knock on the doors and do that type of thing there, and I think that applies to what we want to do here as well.

Empowering a Global Business

Rahim Bhojani:

Practice what you preach. Yeah. Let’s talk about … I spoke about the foundation of Dremio. You’ve actually taken that and built on top of it to service the needs of your business. You touched upon that slightly. Let’s double-click on some of that. A small team like yours, how have they been able to empower a global business that way?

Mark Sear:

Well, I think there are two aspects to that. First of all, we are still nascent on that journey. Not the whole enterprise is there. We’ve got some really fantastic ambassadors. Shout out to Igor in London, in particular. Super, super ambassador. Yes. I know he’s on your case all the time. Don’t worry, Igor. He’s working on it. Yeah, yeah.

Rahim Bhojani:

I like Igor. I like people who are direct.

Mark Sear:

Super smart ambassadors, super direct. We empower those, they’ll drive that forward. I think it comes down to quality people, number one. You have to have quality. You have to have people that are passionate. Passion is something I think that’s really hard to find. When you find that passion, I’m very lucky with my team. We’ve got a guy in Singapore, Ben, super passionate guy. When he speaks, you can feel that passion come out of his body. We’ve got Dipmala in India, same thing, passion. That’s the most important thing for me. Passion, create evangelists that the evangelists go on and create further evangelists. I know it’s going to take time. It’s never going to be overnight.

Identifying Who you Want to Work With

Rahim Bhojani:

I’m sure you’ve hired and interviewed hundreds and thousands of people. How do you identify this passion? How do you identify who you want to work with?

Mark Sear:

There’s an HR lady in the room. She’s twitching at the moment, but I’m going to tell her why. I tend to do somewhat, shall we say, non-traditional interviews. I’m much more interested in the person and how they will fit into a team than what they know technically. My assumption is that you’re sitting in front of me because you already know something technically. It’s about what will that person bring to that team? How do they gel? You can have a super smart, genius person, but if they come into a team and break it because they’re the wrong personality type, very difficult. I’m looking for very specific things, very specific things in people that you learn by just talking to them about life in general.

Dremio Enabling Self-Service Experimentation

Rahim Bhojani:

Now, bringing it back to Dremio, obviously, if you find that self-starter, you find that person with a spark, like you said, how does a platform like this enable self-service experimentation?

Mark Sear:

I think it’s about removing barriers and creating a frictionless environment. It’s about that developer experience, almost I would use the word entrepreneurial experience, if that makes sense. Because I think if you’re going to be successful in a big company, you need to almost think like an entrepreneur. You’re looking for people that can see an opportunity, see that business opportunity, and then they think like an entrepreneur. What do entrepreneurs do? There are two types of entrepreneurs. Entrepreneurs in general reduce friction. They go out of their way to do it. If we can make the experience frictionless for them, they’re going to have a great experience. They’re going to develop their products faster, and crucially, they’re going to come back to us with more ideas, and that’s what we want.

Rahim Bhojani:

Right. Just like your minimum viable company.

Mark Sear:

Like the minimum viable company. Yeah. Work with the people. Don’t tell them what they need. Ask them what they want.

Rahim Bhojani:

And then one of your principles from your CEO, don’t get me fired, how does a platform like Dremio give you governance and auditing capability? How do you make sure the right things are happening?

Mark Sear:

First of all, I should point out this one in time, that our CEO is not in danger of getting fired or going to prison. That was not my intention to compute that. So Dremio gives us a really rock-solid base for making sure that everything that happens on it is auditable. It’s secure, first of all. We can do row-column-level permissioning, really, really important.

Mark Sear:

We’re running zero-trust as far as we can, zero-trust. It’s super good at that. We know who can see what data, when they can see it. It’s correctly permissioned. So that’s part number one. That’s table stakes. If you can’t be secure, it’s really not for us. Secondly, it performs really well, and we can audit what people are doing. We can see where those performance problems are, and in technical terms, create a reflection or something like that, and empower people to do that themselves to give us that speed, stability, et cetera. By doing that, then you empower people to deliver their solutions, and that, in turn, will help generate more revenue and hopefully keep our CEO in a job for many, many years to come.

Advice for Someone New into the Industry

Rahim Bhojani:

Let’s end with, what would you tell a new person trying to break into the industry? What would your advice be to them?

Mark Sear:

My advice to them would be, be flexible. Don’t box yourself in and say, “I’m a programmer, I’m a product manager, I’m this or I’m that.” Super flexible, and get really, really close to the business. Learn what a business does. Learn what those business people want and need to be successful, and then go all in on helping them deliver. Just generally communicate with people. If you’ve got a problem — Ask for help. Ask for help. Don’t just sit there and spend seven hours working on a problem.

Rahim Bhojani:

Thank you, Mark.

Mark Sear:

Thank you for having me.

Rahim Bhojani:

Absolute pleasure. Great to have you here.

Mark Sear:

Thank you very much.

How Dremio Will Win the Race

Rahim Bhojani:

All right. This brings a close to this keynote. I want to wrap up by summarizing how Dremio is going to help you win this race. But I want to draw an analogy first. Steve Jobs talked about the iPhone in 2007. I think what’s happening in AI and generative AI is that much more instrumental than what the iPhone did for productivity. It’s going to be that much more of an enabler, and Dremio is going to be front and center with it. Thank you very much for having me today.