May 2, 2024

Optimizing Cost, User Experience, and Performance: Vanguard’s Dremio Implementation Journey

This talk presents Vanguard’s strategic implementation of Dremio, emphasizing enhancements in cost-efficiency, user experience, and system performance. We’ll explore three critical use cases, illustrating how Dremio’s integration revolutionizes our data management landscape. As Vanguard’s Product Manager, I’ll share firsthand experiences and key learnings, showcasing Dremio’s role in driving significant improvements in operational efficiency, user satisfaction, and data processing speed, setting a new benchmark in data analytics.

Topics Covered

Dremio Use Cases
Lakehouse Analytics
Performance and Cost Optimization

Sign up to watch all Subsurface 2024 sessions

Transcript

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Read Mahoney:

All right, it’s a fireside chat with Vanguard, so we’re not doing the skit up here. But welcome. If you’re joining us virtually, it’s great to have you. We got the crew in New York, so we’re streaming this live from where we are. And I’d really want to welcome Hitesh from Vanguard. And I got it close, hopefully. But thanks so much for being here. We’re going to have Hitesh talk about their experience in terms of what they’ve been building towards from a data platform perspective. Some of that will involve Dremio, some of it won’t. They’re also going to talk about what they’re doing with AI and governance. And then we’re going to have a little bit where we talk about how they’re thinking about success and measuring success, which I know a lot of analytics leaders and data leaders are under pressure for right now, which is there’s sort of a big cost to what’s going on. So we’re lowering that. But it’s how are you valuing other than cost as well in terms of moving forward. So if you don’t mind just introducing yourself and telling the audience what you’re up to at Vanguard.

Vanguard and Dremio

Hitesh Dundi:

Sure. I can do that. So hi, everyone. This is Hitesh Dundee from Vanguard. So for people who do not know Vanguard, Vanguard is one of the largest asset managers in the world. We have about $8 trillion worth of assets under management. And the fact that we want to give an opportunity towards greater investment success to everyone out there, be it our investors, our retail investors, and our institutional investors is something that is at forefront of what Vanguard does and what we do. And we are actively looking at the cost optimizations and giving back to our investors every year. Every way that we take a standard. 

And now, so when it comes to Dremio, what brought us to Dremio is that when– I think it was about two years ago when we started thinking about our infrastructure in general. And we were like, OK, what are the things that are really bothering our end users? Be it our all data users, be it data analysts, data scientists, machine learning engineers, or even the business users who want to be proficient with data and literate with data and have that insight. What we found is that right now, we are facing challenges when it comes to performance with our existing systems. And there are cost considerations that we– because a few of the platforms that we use are rendering us costly. And at the end of the day, we do want to give a superior user experience to our end users, everyone out there. So that prompted us in the journey towards identifying different vendors and going out and seeing, OK, what is the best experience that we want to give out? And so we did a lot of evaluations on everyone on the market and landed with Dremio, which we are really happy about so far.

Read Mahoney:

That’s awesome. You talked about the initial challenge where you had performance costs, and then it sounds like there was also a little bit like you’re trying to get more people involved overall. Where did that push come from? Was that an internal to your data team, the data team that you’re on, or was that coming from a specific line of business or department? Or was it something that was, we’ll say, really top-down driven that says, hey, we need to treat– I think it was the TD Securities gentleman, Karl, earlier, who was talking about, hey, we need to treat data as an asset. What sort of drove the impetus for change overall? And then I’ll have one more follow-on to that.

Hitesh Dundi:

So when we think about the data as an asset, it did come from the leadership. And at the same time, I do want to talk about what Sender initially talked in the keynote, where he mentioned that enterprise customers, large enterprise customers like Vanguard, what are the kind of issues that we are facing? It is mostly on integration across our different data sources, and also having the cost considerations and creating that semantic layer on top of our existing data sources, which actually made us go towards a Dremio-centric way of doing things. That’s what we are trying to achieve.

Read Mahoney:

Yeah. I mean, was there a specific– I run the marketing team at Dremio. But I also– we drink our own champagne. So we have a Dremio lake house that we use to do all of our own analytics. And when I think about that is, I can put pressure back onto our central analytics team. And I could also– but that could also be something that team comes up with by themselves and says, hey, look, we want to provide this as a capability. What was sort of the interchange in terms of making that happen for you guys at Vanguard?

Hitesh Dundi:

So when it comes to specific details on what actually constitute towards that, I would like to go back to the fact that data strategy has multiple components to it, like data being the main component. So how reliable our existing data is, and how well are we accessing that data? And what are the platforms that we are using it? And on top of that, there are governance considerations, security considerations. And the entire gamut of things, and bringing value on top of that, on top of all the investments that we make as an organization, and bringing out value from any investment, it’s actually tricky. And it comes down from different asks from different lines of businesses. There are different– Vanguard, as you know, is a huge enterprise. And we have multiple lines of businesses having their own data teams working together. And then all these data teams, together, the kind of challenges that they are facing, it are universal across Vanguard, be it the cost aspect, the performance, and the user experience. And we want to streamline that as an organization. And that’s the reason why we went ahead with Dremio.

Read Mahoney:

Got it. So those data teams that are internal for all those different departments and line of business are essentially your customers. Yes. And you were listening to them, sort of like a GM or a product manager would do, and say, what are your biggest challenges? And one of the things you kept running into was, hey, we keep getting bottlenecked, or we’re not able to do what we need to do, or, hey, it’s too expensive, I think, was one of the things you started with. That was sort of what drove the need for change and the evaluation you guys did.

Hitesh Dundi:

Absolutely.

Providing Value

Read Mahoney:

Yeah. So one of the things is that you talked about value being difficult. How are you guys looking at value and what you’ve been able to accomplish so far, and sort of what’s next for you in terms of where you’re going with your modernization efforts and trying to move faster at Vanguard?

Hitesh Dundi:

So value is a very tricky subject. I think everyone agrees that in the data world. So what’s the value that this data team is bringing in? So is it the optimization that they are doing? And what’s putting a dollar figure on top of optimizations, or even the efficiencies that we bring in with respect to the time that we are seeing for individual analysis to go through from weeks to hours or months to days? I think we have dedicated value teams who are actually thinking about this kind of measurement and putting that into place.

Read Mahoney:

So I mean, it sounds like for a lot of the teams you’re supporting, building a dashboard or maybe what they’re doing now with AI, which we’ll talk about here in a second, was taking them a really long period of time. Now it’s not taking as much time, let’s call it– what would you say, hours or weeks and months of what it did take?

Hitesh Dundi:

So our initial success with Dremio is that– so we are pretty new to Dremio in general. And we believe that we still have a long way to go. And the kind of optimizations on the features that we want to extract out of Dremio is huge. And we want to do that. And so the initial feedback that I am receiving from the data teams is that– so earlier, the task that a data analyst would routinely depend on data engineer and wait on for weeks together can now happen in days or even in hours. So that in itself is huge. And that is what almost every data team in Vanguard is excited about. I think if you look at most of their laptops, they do have an Adli sticker on top of it.

Read Mahoney:

Yeah. I mean, I would imagine, in their case, being able to cut that down is really meaningful. And like we talked earlier, too, and you were just saying, it’s really hard to figure out what’s that worth. Like an analyst can now get something done in a matter of hours or a day that used to take a month. Like, is that worth a million dollars? Is that worth $10? It totally depends on what decisions they’re making. But it’s required for anyone who wants to have a more data-driven culture.

Hitesh Dundi:

And also the fact that this level of performance improvement is something really crucial for the kind of use cases that we are looking at. So there are certain use cases which require this kind of performance, be it on the fraud side or on the risk side. So these use cases, we cannot wait for more– like, let them just go through the entire process and analyze the data and send out a report after two weeks when the fraud or risk is happening right now. So we need that kind of performance. And we are going to rely on tools like Dremio to accomplish that.

Thinking About Governance

Read Mahoney:

As you push to more self-service– and you mentioned the fraud and risk teams. Those are, I think, great examples. How do you think about governance? Or how has governance had to evolve? Because in the world where everything does have to come back to the central team, while the bottleneck is tough, the advantage around that– or at least the seeming advantage– is control. You’re like, well, everything’s got to touch this group. And I know what that group’s doing and so forth. How have you guys had to evolve thinking about governance when you’re moving into a more sort of self-service and decentralized model?

Hitesh Dundi:

So governance is an active discussion that’s happening at Vanguard. And also, governance is at the forefront of everything that do. So I think, as Vanguard, we are very risk-averse when it comes to data. Because the responsibility that our investors are placing on us is humongous. And we take that responsibility really seriously. And the kind of effort that we put in our governance and risk practice is significant. And we do not want any leak of data, or even, to that matter, anything to go out of Vanguard. And we are placing a lot of emphasis on that. And before even data comes into Dremio, we are making sure that we have all the internal processes and checks in place on all the data products that we are putting in Dremio. And making sure that these are being accessed only by the persons that these data sets are intended to be used. And also, the fact that the features that Dremio has, mostly on the row level and column level security, is something that is of real interest to us when it comes to governance. And making sure that people who has access to what information, and then they are the only ones who are accessing it.

Read Mahoney:

Sounds like you guys have your, you essentially have your own governance system. And then you’re inheriting, or were inheriting in this case, the rules or requirements for that person or for that attribute that you guys are assigning to Dremio. Is that essentially how you set it up?

Hitesh Dundi:

I would say yes and no to that. So earlier, our approach towards governance is more restricting everything. Right now, what we are trying to go ahead with is kind of a hybrid approach. So can we leverage the access provisions within Dremio? Can we do that? So that is something we are experimenting with. And it depends on the results of those experiments that we will actually take a call on.

Read Mahoney:

Yeah, I would say because we work across multiple systems, and it sounds like one of the main use cases that you have is you’re bridging across different data sources. You’re able to improve access to the different data consumers, and you’re doing that through virtualization. Yeah, sometimes we’ll inherit that from existing governance systems, or governance providers like Privasera, or homegrown systems, or people will do it directly with us. But it really just depends on what their overall data topology and architecture looks like in terms of what’s right for them.

AI at Vanguard

So in terms of moving on to what’s happening with AI at Vanguard, we talked about more, it seemed a little bit more, we’ll say business unit, analyst-driven in terms of getting to the data and reports oriented initially. How are you guys evolving that with all the needs coming from the business now in terms of AI, and we’ll say a lot of the hype, and some of the reality around gen AI as well? What’s that balance looking like for you, and how are you guys thinking about evolving?

Hitesh Dundi:

I think when it comes to generative AI, it is pretty universal, and everyone from boardrooms to coffee tables, they are actually talking about generative AI. That’s no different at Vanguard as well. As I said in the previous governance question, it’s more like taking a very risk-averse approach towards generative AI, and putting emphasis on what the capability can do and cannot do, and knowing the risks that it poses, and also how we are going to mitigate all those risks within the confines of the infrastructure that we have, and also making the best of it is something that we are actively looking at.

Read Mahoney:

Yeah. It sounds like you guys are in a lot of, we’ll say the testing and evaluation phase as it relates to generative AI.

Hitesh Dundi:

Isn’t everyone?

Read Mahoney:

I think it’s not. Some companies are definitely in production, where they have models working, they’re typically in production. Okay. I see what you’re saying. It’s not like a dev test model necessarily, it can be a production model, but it’s still evolving in terms of the performance of that, and what you expect it to do, and how do you improve that, and so forth, as you’re using those technologies. What about just in terms of machine learning as well? How has that been evolving at the same time, or do you have a lot of data scientists too, now able to access data more quickly, and put that into their model building, or feature engineering processes?

Utilizing Machine Learning

Hitesh Dundi:

That is something of interest to us, but right now the emphasis that we are putting on is more on data analysts, and the business users. Data literacy is at the forefront of everything that we are doing currently, and the business users and data analysts are at the core of our dream new strategy, at least. Slowly, we do want to enable this to our machine learning and data science teams as well, because we want them to leverage the capabilities and the performance improvements that we are getting out of Dremio, and then slowly enhancing their entire model building life cycle.

Read Mahoney:

Are you thinking of that more as a citizen data science? Are you thinking of this more in terms of meeting the analysts themselves, doing AI for BI, or actually building their own models that are then hosted, and running, and are in production? Or is it, “Hey, no, this is for the actual data scientists coming in.” How are you guys thinking about that? Because there’s two camps here. There’s a group using all the open source tools, and then there’s the group using a bunch of, let’s say, the AutoML tools. Sometimes it’s both, but you sort of get into this world, and it’s like, “Well, who are you really serving?” I think it’s really important to understand the customer in those cases.

Hitesh Dundi:

We are trying to carve out the niche for Dremio, mostly on the data analyst and the citizen data scientist segment. Depending on the success that we see with data analysts and citizen data, that is when we would actually go towards an advanced data scientist, or even the machine learning engineer.

Read Mahoney:

Yeah. You guys are almost attacking department by department. Attacking is maybe the wrong word, but you’re trying to modernize or develop the way that the company can move forward in that manner, and then also by persona.

Hitesh Dundi:

Persona is something that’s a good term to use, because we are excessively focused on the data analyst and the business user persona right now. Data scientist and machine learning engineer is something we certainly see that Dremio can provide advantage to the model building life cycle, but right now our emphasis is on these two personas.

What are Gaps That Still Exist?

Read Mahoney:

Okay. Yeah. On the business user, what are gaps that still exist? What are you hoping for, whether it’s from us or from the market in general, in terms of what takes the business user to be more productive going, we’ll say shifting left, using Sender’s talk from this morning? How do they get closer to the data if they don’t write SQL?

Hitesh Dundi:

Well, if they can say something and it is delivered in a report, that is the word that I want for the business users, because I think that’s the word that even the business users would want. It’s just that with the advent of generative AI and the assisted tools, would we really want them to be literate about SQL and other tools when gen AI can do things? It’s an active debate, but yet we as a company, we are actually putting emphasis on… We have a big data literacy project, wherein people actively are trying to make senior leaders within the organization more literate about data and everything that’s happening with data.

Data Literacy

Read Mahoney:

Well, let’s talk about that, because TD Securities brought that data literacy term up earlier today. You said data literacy is hard. What does your program consist of, and then how do you know if you’re making progress on a program like that?

Hitesh Dundi:

Well, I think I’m not a really good person to answer that question, but I’ll try to answer it as best as I can, because when it comes to data literacy, our prime focus is to at least make the leadership aware of different data terms and the technologies out there, and then make them proficient in at least the knowledge of it, the verbose, the verbiage of it, rather than going in deep. I think they are taking a tiered approach, but I’m not completely sure about that, because I’m not involved in that.

Read Mahoney:

So in this case, it’s mainly education.

Hitesh Dundi:

Yes.

Read Mahoney:

And the education is around… It sounds like common terminology, things associated with statistics, likely, and then as they would understand that, then as they’re reviewing reports or information’s brought to them, they have a background in terms of how to do that. Yeah. Okay, cool. So I guess the question that’s going around is, people building data cultures, what does it mean to be more literate? I think some of these programs have actual training, like certifications built in. Some of them mean you actually are doing more with data individually.

Hitesh Dundi:

So- So we do have those internally developed trainings. They involve all these components. It’s just that I’m not privy to the complete details of it, so that’s the reason why I’m not commenting about it.

Tool Sets and User Experience

Read Mahoney:

Okay. So I’m going to take one thing back to Dremio real quick. So we talked about text-to-sequel this morning from a Gen AI perspective. We talked about labeling and creating of wikis from a Gen AI perspective, and part of that’s obviously to help the analyst, because if you use Gen AI to help the analyst, could they go do all that themselves? Sure. Are we cutting the time down? Yes. Could we… Do we still give them the sequel to edit? Yes. So when they look at what Gen AI gives them and they’re like, “No, I want to change that,” or, “I don’t like the way that’s written,” they can go edit it. Do you see, as you’re starting to move even more to the business user, do you see using tool sets and the user experience within something like Dremio, or do you see that coming from tool sets that would connect to us, let’s say like Tableau? One of the gentlemen earlier talked about Superset, Apache Superset. Or do you see developing that yourself, in terms of bringing that and starting to work on that group? 

Hitesh Dundi:

So this is actually a tricky question, because nowadays when you look at different products out there, every product is throwing in a Gen AI component to it. So Tableau has a Gen AI component to it, Dremio has one, and we internally have Gen AI for different tasks. So how are we going to marry all these components together is something that we haven’t put a nail to it, because firstly, we haven’t experimented with the Gen AI components of all tools together and married them together to actually see what is the experience like. So at the end of the day, we would always think about what is the best experience that we are delivering to our end users, be it data analysts or business users, and what is best for them is something that is of interest to us. And if an experience includes Dremio’s Gen AI, and not other products’ Gen AI, we will pick that. So we will make that decision based off of the user experience that we are delivering end to end.

Read Mahoney:

Yeah. I mean, is that an active project or an active priority for you? Or is that something where it’s like, yeah, we need to solve this, but we’re sort of, you know– we’ll call it the Wild West is good enough, as long as it’s governed. Still governed, but people can choose the tools they want or deal with it the way they want to. But we’re actively trying to create a layer for that audience and move that through.

Hitesh Dundi:

So I think I would answer that question in this way. So we are taking baby steps towards that future. Right now, what we are trying to do is, OK, there is this– the entire data analytics lifecycle. Let us see what are the self-service components out there and enable them so that data analyst is– data analyst can do self-serve. And now, once they are done with that, OK, how can we improve that experience one after another? So what are the different features? Yeah, of course, Gen AI plays a vital role in that. So where do we fit Gen AI into this experience is something that we will discuss.

What’s Important to Vanguard?

Read Mahoney:

I’m more asking like relative priority, meaning like, what are you– you guys have put Dremio in. People are able to move faster. You’re expanding into some other departments. But what’s really important for you now? And is this one of those things? Or is it actually not one of those things, and you’re focused on more core elements?

Hitesh Dundi:

So I would, again, go back to the initial comment that I’ve made. So it’s about user experience, cost, and performance. So user experience– and we believe that generative AI has a greater role to play in the user experience component of it. So it’s definitely in– it is something that we are looking towards.

Read Mahoney:

Got it. Yeah. So what are some technologies that you guys are looking at now overall that might be just out in the overall Lakehouse environment that you’re working on from a user experience or performance type of environment that you’re trying to add in? Or are you guys really just baking what you have and just expanding that usage internally?

Hitesh Dundi:

So I think more of a latter. So we are baking what we have and then expanding the usage, at the same time striving to make sure the entire user experience is streamlined.

Challenges

Read Mahoney:

Yeah. So what are– as you’re trying to expand usage, what’s hard about that? What challenges are you running into in terms of you’ve built the platform, you know the users that are using it are getting value, but how do you actually– it’s sort of like you’re trying to acquire customers, right? What challenges do you run into trying to make that happen?

Hitesh Dundi:

So the number one challenge that I would think is on the cost aspect of it. So it’s mostly on– so when we are thinking about costs, so what’s the best argument that we are putting forward? So there are different perceptions of different tools out there, and how are we– and even the transition cost, it actually matters. So we are thinking about that in a holistic way, and then approaching that.

Read Mahoney:

So your argument to the data teams that would use the platform and the other lines of business is basically, hey, we’re doing– maybe you guys do chargebacks, I’m not sure how it works, but you’re basically saying, look, you have this cost structure today, here’s your switching cost, and once you do that, this is what your payback period sort of looks like before your costs start to drop, and that actually affects that lines of business budget in a positive way, and that’s sort of how the– we’ll call it the internal sale happens.

Hitesh Dundi:

Absolutely.

Read Mahoney:

That’s true, that’s amazing.

Hitesh Dundi:

It looks like we can have you do the sales pitch internally.

Read Mahoney:

I feel like it’s like marketing internally at Vanguard is like the requirement for adoption. It’s really putting the business case together. It’s something that we’ve done at Dremio. I’m not sure how many people in the room have experienced our calculator, but we built a TCO calculator exactly for this reason, is we have a lot of customers, like Sender talked about, where cost is really important, and we’re just like, well, let’s just lay it out for them in terms of what we see on average for both sort of labor and price performance and software costs. You package those things together, and you say, this is what it looks like in option A, and this is what it looks like in option B, and then it’s like, if option B looks better, how fast can I get to it? What’s the payback period is a common way that people will look at it, and then they’ll look out at, hey, what’s the ROI over time, too? I’m not saying they don’t do that, but like, hey, will this pay back in three months with existing budget kind of, or will it pay back in six months is always really interesting to them. Time to value.

Hitesh Dundi:

Yep, no wonder you’re the CMO.

Read Mahoney:

Thanks, man.

header-bg