Dremio Jekyll

Evolve 2018: Pitch Your Tech in 5 Minutes

Transcript

Kelly Stirman:

Thanks for having me on [inaudible 00:20:22]. So my name is Kelly Stirman, I am the VP of Strategy for a Company called Dremio. That is not a unicorn dolphin, that's called a narwhal. So, they only gave me five minutes to talk. What are you going to do in five minutes? So I thought, what are some different things? You could listen to a song. You know, most songs are less than five minutes. Maybe some of you could solve a Rubik's cube in less than five minutes. Anyone? Rubik's cube in less than five minutes? Oh, somebody actually. I didn't think anyone would say yes. 50 push ups in under five minutes. Any hands for that? Yes. A few more people that can do that. We can also meditate in about five minutes. We would probably all be better off if we did a little bit more of that ever day. And you could make a sandwich. And this got me thinking.

Sandwiches, you know, they're delicious. A good sandwich is a really amazing experience that can be life changing in some cases. And the more I thought about sandwiches, the more it made me realize that you know, sandwiches are a lot like data in the enterprise. So bear with me. Data in the enterprise. This is Katz's Deli in New York if you've never been there. How is it like data in the enterprise? How are sandwiches like data in the enterprise? Well, if you were going to go make a sandwich, you would start by by going to get great ingredients at your favorite deli. And what's the experience like? You go, you take a number, and there's one guy that you're waiting on. You're on this side of the counter, and that one person behind the counter is your data engineers. Because those of you on the other side are your data scientists, and they're your BI users.

Who are waiting on the dat engineer for the data they need to do their jobs. Has anyone here waited on IT for data you need to do your job? It seems to be a core experience that everyone.

Dave Hitz:

Oh come on, you're not paying attention.

Kelly Stirman:

Suffers through on a daily basis. I know I'm part of that. Raise my hand. So why does this happen? We'll talk about that in just a moment, but let's think about the experience. The BI users, the dat scientists, they're all waiting. They're not doing their jobs. Right? They're waiting and dependent on IT. And the experience of the data engineer is that they're overwhelmed, right? They have a bunch of people in line, that they can't service and fulfill their needs as quickly and effectively as they'd like to. And the ratio of numbers we're talking about here at most companies, is that for every data engineer, you have about 100 data consumers. So your BI users, your data scientists, your data analysts. There's about 100 of those for every one of your data engineers.

And why is this such a challenge? Well this is what your data engineer is doing. So you have data being created in different technologies. So your Oracle databases, your SAP systems, maybe some of your newer technologies like Mongo DB and Elastic Search, or Hadoop. And that data is being moved by your data engineers in some kind of data lake. Maybe that's in the cloud, maybe that's on prem. Then that data gets moved into a data mart, then cubes, then extracts, and aggregation tables are built to fulfill these different needs of the different data consumers. So you have lots of copies, it's slow, it's complicated, it's fragile. What Dremio is all about is a completely different approach. Where Dremio runs between the existing BI tools and data science platforms that you're already using, and all the different data sources you have. Whether they're no sequel, relational, your data lake, whether they're on the cloud, or on prem. Dremio runs right in the middle. And it gives a self service experience for the data consumer to do everything they need on their own without being so dependent on IT.

And it takes care of the really hard problems that your data engineers are struggling with today in terms of accelerating the data, transforming data for different purposes and different needs, and governing and securing access for different user groups. So what is Dremio? It's a data engineering platform that helps you get more value from your data faster. It makes your data engineers more productive, and it makes your data consumers more self sufficient. Now let me just tell you about one customer, TransUnion. So TransUnion is a consumer credit reporting bureau, that aggregates data on about a billion consumers worldwide. They have about 65,000 customers in 30 countries. What do their data engineers do? Well they process billions of updates a month, from 90,000 sources on 30 petabytes of data. And they build innovative products, like Prama, that put that data in the hands of their customers.

TransUnion uses Dremio to accelerate the analytics and visualization of that data for their customers, and they use Dremio to make their data engineers more productive, so they can get more value from their data faster. So, there are a few of us here, thank you. From Dremio in the audience, my colleague Scott in the back and we are actually pretty easy to recognize. You see us walking around, so. Thank you very much.

Dave Hitz:

Thank you very much. Do you wear that in real customer meetings? I just have to know.

Kelly Stirman:

Does my horn intimidate you Dave? Sometimes we do wear these to customer meetings, yeah.

Speaker 3:

So I have a question, I do a little bit of research. What's the deployment model? I think you guys are saying that you're open source, so what is the deployment model? What is an IT show up need to do in order to deploy your software?

Kelly Stirman:

Yeah, so correct. Dremio is open source. It's a distributed system that you would run on, I mean, that's something you could try on your laptop. But it's something in production you would deploy on tens or hundreds or potentially thousands of notes. If you have a significant investment in a Hadoop cluster, you can run Dremio as a native yarn application in that Hadoop cluster, but it's not dependent on Hadoop. So one of our first customers is running Dremio on top of a mix of Elastic Search and Sequel Server. With Tableau and Power BI as the tools of their data consumers. So our deployment model is you run it as a cluster, we have different types of notes that scale out for data volumes, and number of users. And it's an elastic product that you can run in the cloud on prem, or wherever you like.

Dino:

So from the open source perspective, how do you want your customers to engage with you for the creation of connectors of adapters for systems that you don't currently support?

Kelly Stirman:

Great question. So the truth is, for every company that you're data's in lots of different technologies, because for 30 years, the answer from every vendor has been, just put your data in our silo, and we'll solve all your problems. So you have a mix of relational databases, no sequel, Hadoop, different kinds of things. We support the most popular data sources but there are things that we don't support today. So we've made Dremio open source so that everyone can use it, but also so that a community can build a different capabilities into the product, including connectivity to different sources. And we've already seen that with a tier one investment bank who's in the process of open sourcing a connector to KatyB, which is a specialized time series database that's been available to financial services for a number of decades.

Dave Hitz:

So I steal from the audience. If you're open source, how do you make any money?

Kelly Stirman:

Good question. Working on that. No, so we, our model is we have a community edition that has all the features and functionality that most people need to fall in love with the product. And we have an enterprise edition that has some key capabilities around security and management capabilities and connectivity to a few high end sources like Terra Data and DB2. So we sell a subscription, an annual subscription that we license per node, that allows us to monetize the product. And that subscription includes support and access to the enterprise edition of Dremio.

Dave Hitz:

And you structured it so that people can use it a little without the premium, but generally speaking, enterprise ware, people would want to get the subscription, hopefully?

Kelly Stirman:

We have start ups that maybe security is not top priority for them, where they're running Dremio on 100 plus nodes, on our community edition. But most Fortune 1000 companies will view role based access control and integration with LDAP and Kerberos and things like that as essential for a real deployment. So yes, you fall in love with the community edition, and make it as big as you want, but when you want to go into production, you're going to use the enterprise.

Dave Hitz:

Or if you help start ups frow up fast, then they hit a certain point and the board of directors starts asking certain questions. And any more questions?

Dino:

Yeah, so one other question I had is, I know how long it takes to ramp up on a new tool, just as any. Right? It's in [inaudible 00:28:54]. What would you say is the time to become really proficient? Because I saw how you drag in different sources and you start dissecting the data, but you also need to know other sources and what the data is coming from.

Kelly Stirman:

Yeah, so there's sort of two, two types of use of the product. One is the data consumer's use of the product. And that is something where we've looked the Google Docs basically, and said, what is the experience of Google Docs? Did I take training on Google Docs? No, there's a search bar, I click, I can collaborate with other people, and I have native integrations to the most popular tools like Tableau and Power BI and Flick. Right, so with a single click of a button, you can launch that tool connected to a data set, you can write no code and do everything in a browser, and most people are off and running in the first few hours.

Then you have the data engineer's experience with the product, which is a mix of restful APIs, and foundationally, everything in Dremio is based on standard anti sequel. So to the extent, you have people who know sequel, they can use sequel to build and manage and perform their data engineering tasks, and then orchestrate that with the same kind of tools they're using to orchestrate the rest of their infrastructure. And sequel is a really critical skill right, and the beauty is that it's been around for 30 plus years, and every tool on the planet supports it, and that's why we made it the center of our technology and our integration strategy. There are a lot of newer technologies, where your data is being created, that don't support sequel. Things like S3. Things like ADLS on Azure. Things like Mongo DB and Elastic Search, and large pieces of Hadoop. They're really just not compatible with sequel.

Well with Dremio, everything is compatible with sequel. Everything is on a level sequel playing field and we make it incredibly fast automatically in the background. And that's really the core value for a data engineering group is, we can leverage our sequel skills and integrations and tools that are already deployed in the enterprise. It doesn't matter where we put our data, we get it really fast for all of our different analytical work loads.

Dave Hitz:

I'm kind of curious, oh, did you have another question?

Speaker 4:

No, go ahead.

Dave Hitz:

I was just kind of curious. Here you are, high level IT guy and you run into someone like this, and he gives you his elevator pitch. Like where's the next stop in your organization? Do you send him straight to the data engineers? Or like how would he navigate through, when you go, "Oh, I know who you should talk to next." How would that work?

Speaker 4:

We have a data analytics team as part of our network engineering organization that I would tell you  to go talk to immediately. Or talking larger into our operations organizations and just understand, there's a lot of disparate tools, we've just been through a merger a couple of years ago. We're still trying to bring the company together. So we're trying to bring all this stuff together, so there's a lot of sources of data. So talking ops and talking to the analytics teams is who I would start with.

Dave Hitz:

Yeah, that's my experience as well.

Speaker 3:

I mean we'll bring the analytics team, one of the leads. You said sequel, so bring a strong sequel guy. Probably be in security just because we're talking data. And I would bring the infrastructure because I have to think about the cost. Where it's going to run and tens of thousands of nodes and say.

Dave Hitz:

Is that your experience typically? The last start up was security, so I figured it would be the Seeso. And he's like no, it's the app [inaudible 00:32:08] or who's app we're securing. And so I'm wondering if there's another path in that you see sometimes.

Kelly Stirman:

Well, there's two paths for us. One is that core function in the company that's responsible for ETL and data warehouse, and the Hadoop infrastructure and the BI servers and that kind of data services team.

Dave Hitz:

So whoever's building the internal business intelligence?

Kelly Stirman:

The central function that's powering all the different data consumer groups. Then we also meet the data consumer teams, who say, "You know what? I am tired of waiting. I want to do things myself. I don't want to stand in line in the data bread line, holding my number waiting to be called. I want to go do these things myself, whether I am a data sciences team working in Python and R or SAS or something like that, or the Tableau group that is tired of waiting on extracts to be built. And they want to go and build data sets themselves, and get them in a really fast way using whatever tool they like.

Dave Hitz:

Got it. So we do-

Kelly Stirman:

[crosstalk 00:32:55] And in that model, it's sort of a, it's kind of like a pharmaceutical ad, where it's like, ask your doctor about. Because they're to going to buy Dremio for the desktop.

Dave Hitz:

Right, right, right. But they would have to be a consumer-

Kelly Stirman:

[crosstalk 00:33:03] Then they go to IT and say-

Dave Hitz:

Who wishes to be a business intelligence group, say, would do that. Well the experiment worked last time. Let's see if we can get an audience question. It has to be super short and someone either loud enough or close enough. Anyone have a question?

Kelly Stirman:

There's one over here.

Dave Hitz:

Yeah?

Speaker 5:

So, this is off of your last question. So I don't need a data engineer, I don't need analysts. I can just depend on my customers for [inaudible 00:33:27]?

Dave Hitz:

I don't need a data engineer, I don't need an analyst. Oh, that's interesting. And the person you sell it to is the data engineer you're about to fire.

Kelly Stirman:

No, I think you're going to need data engineers for many years to come. The goal though is, they're overwhelmed, and they're putting out the next fire everyday instead of thinking about the bigger problems and being more sort of strategic. And so this is about solving for the request that the data consumers could fulfill themselves and solving for acceleration of data and transformations of data in a way that's much more scalable, easier to manage, and easier to govern. So there's still lots of work for them to do. Like I think of ETL is you kind of have the long haul ETL and the last mile ETL. And Dremio is going to really help with the last mile ETL, but you probably still have a big chunk of long haul ETL that you need your data engineers to focus on.

Dave Hitz:

You know, it's funny. Because when I think about questions like this, and who you're going to put out of work. There's certain jobs, like in the sandwich store, if you really replace the sandwich maker, like you're cutting down on jobs. It feels to me, like in terms of the digital transformation data revolution that's going on, I don't hear very many people going, "Oh, well I've just got too many data engineers and they're sitting around idle." That's not, people are like, I can't find a skill, I'm trying. You know what I mean? And so it feels to me like we're at the point where if you can accelerate this stuff, it's all good. You know, for now. 10 years for now, who knows. But it feels to me that you're probably finding people that are just like desperate to get more done because they're busy.

Kelly Stirman:

Yeah, you're still going to have your Tableau users who are asking the interesting questions, and telling us stories with the data. This is about helping them be more independent and self sufficient.

Dave Hitz:

Yeah, absolutely.

Speaker 6:

What's up with the hat?

Kelly Stirman:

So this is Gnarly, spelled with a g. This is a narwhal, this is our company's logo, and it's the only real unicorn. So that's what that's all about.

Dave Hitz:

Oh, you want to be a real unicorn. Duh. Okay. Thank you so much.

Kelly Stirman:

Thank you everyone.