Dremio Jekyll

InsideAnalysis Interview With Kelly Stirman

Transcript

Eric Kavanagh:

All right, ladies and gentlemen. Welcome back, once again it is time for Inside Analysis, the show that talks all about the information economy. Things are changing out there. It's a very exciting time quite frankly to be an entrepreneur, to be in the world of business. So many interesting things happening. We have this whole confluence of innovation happening in the world of technology and also in the world of data, of information. Used to be we would just talk about managing data. Now, you can really manage information, all kinds of different data. We've talked about a lot of these different subjects on the show over the past few months or so and today we have an excellent guest lined up for you, a company called Dremio and Kelly Stirman doing some very, very interesting things.

So we're gonna talk about that. Feel free to tweet with the hashtag of Inside Analysis or send an email to yours truly, info@insideanalysis.com. But we're gonna talk about what's really been happening in the world of data and data management. And obviously there's this whole movement of business intelligence that we've talked about before and data warehousing. For those of you who have not heard of the concept, really data warehousing arose in arguably the '80s and the '90s when a lot of large organizations realized that they really couldn't query their operational systems. So like SAP, for example, you have your enterprise resource planning solutions that manage everything from procurement to delivery and all this really important business stuff, transactional systems, getting stuff done. But it was hard to query them and really get sort of a strategic view of what's happening.

So we had this whole movement spin out called data warehousing where we would pull, in batches, data from all these different systems, load it into warehouse and then build cubes on top of that. Those cubes are just basically data marts or mini version slices if you will of the data. And we did that for a whole variety of reasons. One, because like I said, you couldn't query those big operational transactional systems. But two, also because quite frankly back in the day, and again this isn't the '80s and the '90s, you think about the fact that computers processors were pretty slow back then relatively speaking. Storage was relatively expensive. The network speeds were pretty low. And so we had all these different constraints that allowed only certain kinds of solutions to be designed that were affordable, and when I say affordable, I mean in the millions of dollars typically. So Fortune 2000 companies, probably all of them these days have some kind of data warehouse out there.

But what has happened in the last few years is that this whole confluence of change has disrupted the industry. Of course, a lot of that stuff comes out of Silicon Valley. A lot of that stuff comes from the explosion of mobile. For example, the explosion of social media. There's so much data out there now, so-called big data. And what really happened is quite fascinating. The data volumes and varieties and velocity were so great and so different that the early stage companies like Yahoo, for example, like Google, like Facebook, like Linkedin, Twitter even. A lot of these companies looked around as they were designing their solutions, their vision, trying to put it all together. And they realized that off-the-shelf technology was not gonna get the job done. So traditional database technology from IBM, for example, with their Db2 or from Oracle, it just couldn't cut the mustard. The scale was too great. The pressure was too significant. And so literally, they rolled their own.

So these Silicon Valley behemoths today, which are just rocking and rolling and fundamentally changing how we do business in the world today, they went and they created their own technologies. So in the past, we've talked about this Hadoop movement, which is really quite significant. It was the data phase of the open source movement basically. And open source has fundamentally transformed the development, the design and the production of software, of enterprise software. Of course, Linus Torvalds deserves a lot of the credit way back when, and this is, gosh, 30 plus years ago now. He was tired of, I'm sure, dealing with the Microsoft operating system always changing and some very interesting things happened. IBM, for example, dumped something like a billion dollars of investment into Linux, the open source operating system. And they did that primarily because every so often, every six month to a year, Microsoft would pull the rug out from underneath IBM and all the other app developers by changing the operating system.

If you ever go and look to download some software, it's not so much the case anymore because things are changing, but you might remember this, that you have two versions of Mac, for example, one version of Linux and like 18 versions of Microsoft, all those different operating systems. Well, from an application development perspective, that's a real pain the rear. It's very difficult to stay on top of changes in the OS and so what happened is IBM comes along and dumps a billion dollars of investment into Linux to prop it up and make it enterprise caliber and they did. It worked like a freaking charm, it was just amazing how well that worked. And someone told me in fact, a guy from Talend was telling me awhile back, another good open source company, apparently IBM made its money back in something like a year. So it took them one year after they had propped up Linux and made it a truly enterprise grade operating system, within one year, they got that money back.

Well, now there's been this whole wave of innovation around open source and really it's ... Open source is now infused itself into the zeitgeist, if you will, into the whole thought process around designing software and that has done just amazing, tremendous things. So we've had this whole ecosystem around Hadoop, which again was at Yahoo and it's what the folks at Yahoo came up with in order to be able to index the web. If you think about the size of the web, to be able to crawl the entire internet and reduce it down, they had this process called MapReduce where you map out where all these keywords are and you reduce it down to a little formula, that was what Hadoop amounted to basically and it's what Yahoo used to index the web.

Well, then when Yahoo open sourced that, you had companies like Cloudera and Hortonworks and later MapR and then this entire ecosystem built around what they had open sourced. So since then, we've had, for example, Linkedin open source Kafka, which was the messaging system that runs Linkedin. I mean, this stuff just wouldn't have happened 30 years ago, so it's just amazing what has been going on.

Well, getting back to the whole data warehousing movement, what we've now seen in the world of business intelligence and analytics, trying to understand your data, is open source has really super charged that environment as well. And that is where Dremio comes in and these folks, I have to say, have done a really impressive job of being able to deliver analytics without moving all the data around. Because that's the way it works in the data warehousing world, is you have to just basically forklift data from one place to another, usually on a daily basis. So Dremio has come along and done some really interesting things.

With that long intro, let me bring in Kelly Stirman from Dremio. Welcome to Inside Analysis.

Kelly Stirman:

Hey, thanks for having me. I'm really excited to be here.

Eric Kavanagh:

Yeah, so why don't you tell us a bit first about the idea for the company. I know I'm pretty familiar with what you guys are doing and the team that you've put together is just really impressive. You know, it is a small world out there in the world of data management and so forth and you guys put together one heck of a team. Where did the idea come from and how did it sort of form into what it is today?

Kelly Stirman:

Yeah, sure. So the two co-founders came from one of those Hadoop distribution companies called MapR. And I think one of the things they observed in working with companies all over the world who had some of the most demanding data management, data processing requirements is they would embrace the idea of Hadoop and moving data into this new environment for new kinds of processing.

But the nature of the relationship with the company they worked for, MapR, and I think this was true for all the Hadoop companies, was that sure, you were gonna buy some software but that the bigger relationship was around services and an enormous amount of consulting that was required to get these projects off the ground. And that's due in part to the fact that this is sort of a new world, where data is less structured, it's moving more quickly, it's a distributed processing challenging ... Distributed computing challenge, which is just a new area for most companies. But also this is a technology, Hadoop, that is very raw and low level and requires software engineers to make it work and most companies don't have a whole bunch of software engineers sitting with nothing better to do.

So-

Eric Kavanagh:

Right.

Kelly Stirman:

So you would buy the licenses for the software, but you were also buying a lot of consulting. And so the two of them said, look, we're seeing the same pattern over and over again. People are moving data into these environments and ultimately what they wanna do is run their traditional BI tools on the data that's in their data layer, in their Hadoop cluster. And right now, the only way for them to do that is by writing their own software and paying a whole bunch of money in services. Why don't we start a company that solves this problem for them and gives them an easy to use, high performance, open source solution to this pattern that they were seeing over and over and over again.

And so there were two things that happened for them to start the company. One was to leave MapR and the second was to start a new open source project called Apache Arrow. And we can talk about Arrow a little bit later, but that's sort of the two moments that began the company for the two co-founders. And now we basically have a team of folks who've been working in NoSQL and in Hadoop for a better part of a decade, who've come together to take this company and the amazing idea that is Dremio and make it a commercial success.

Eric Kavanagh:

Yeah and you mention NoSQL, that's something we should describe for our audience. And it's actually a pretty interesting story, so SQL, it's spelled S-Q-L, stands for structured query language and it's basically the de facto language of database and of talking to databases, of pulling data together to run queries, to understand what's going on. So when you run a query, when you launch a query, 99 times out of 100, you're using some kind of a SQL engine.

And back to our storyline about how these companies created their own technologies, well, I recall actually, you might find this amusing Kelly, I recall talking do a Dr. Michael Stonebraker back in 2006 when I was working at the Data Warehousing Institute. And I wasn't familiar with who he was because I was only new into business, but I found out from my analyst friends, oh, he's a luminary, he's the guy who invented Postgres, basically he designed it years ago and he's a real visionary. And he was telling me ... He was working for Vertica at the time and he was talking all about one size fits all. And he said that the industry in general for database had, just for a variety of reasons, settled on a relational model, which is a very useful model for things like analysis. It's a very useful model in lots of different ways. But there are certain things it doesn't do very well.

And I recall him saying that their approach of course was columnar, meaning a columnar database where it's not really in rows but columns. And that's important for a couple reasons, one of which is compression. If you have a columnar database, you can compress that data much more effectively. That allows you to move it across networks and do other things with it and that was very important. And I remember asking, "Well, why do you think this happened? Why did relational become the de facto standard?" And he had a pretty fun answer. He said that basically it was a challenge of sales and marketing, plus of course engineering. And he said it was just too complicating for sales and marketing people to go out and explain different database types, which I thought was kind of funny.

But you bring up a good point that NoSQL became a whole movement and then of course, after that, we had all these SQL on Hadoop type things coming along, so it's like what goes around comes around, right? But what you guys have done by creating this platform for leveraging all these different kinds of information systems, is you're really kind of bridging that gap, right?

Kelly Stirman:

Yeah, absolutely. What's happened with NoSQL, it's really in the area that you began this whole conversation which is the operational systems the companies used to run their business, their e-commerce, the mobile applications, social applications, the things where you are interacting over the web with some company, those are applications that are increasingly being built on non-relational technology. And the reason, there are many reasons, but some of the reasons are just these are technologies that scale more easily, that are more efficient in their use of resources, and probably most importantly are faster for software engineers to build applications and add new features.

So the operational side of data, where data is born, so to speak, has evolved to be a mix of relational and NoSQL technologies. But the world of analytics is dominated by the relational model and SQL, and so the reality that many companies face today is the data that they need to analyze to make sense of their business is in a mix of different technologies and many different formats and much, much larger than any single server can accommodate. So they have no option except to think about newer technologies that deal with this variability of data structures, that can run on a mix of computers in their own data center or in the cloud and give them this kind of flexibility to accommodate the diversity of data but still allow them to run their analytical workloads.

And Dremio is designed to bridge these worlds, to make it so that the data can stay where the data is being created and allow traditional tools, like Tableau, like MicroStrategy, like Qlik to interact with the data no matter what the underlying technology is and no matter how large it is. And to do so in a model that's self-service, so that consumers of data can do things for themselves instead of waiting in the data bread line, waiting for their number to be called and for IT to be there to help them with what they're trying to do, make it so they can do things on their own. So it's a very exciting idea and product that's solving a major, major pain point that virtually every company has.

Eric Kavanagh:

You know, you bring up such an excellent point there, and I think a lot of business people out there understand exactly what you're talking about when you refer to going to IT to get help with something. Obviously, the IT teams are gonna be critical for forever, it seems to me. That's never gonna go away. You're never gonna have IT go away as a role or responsibility in an organization, right? But any time you have to go to IT, well, that's a time intensive process. Those folks tend to be very busy and so we're seeing this whole movement around self-service. And in order to do self-service right, you have to have some kind of technology like what you guys have, right?

Kelly Stirman:

Yeah. You need a software solution to enable self-service, otherwise you have to turn your analyst into software engineers and most companies are not in a position to do that.

Eric Kavanagh:

Well, and of course, one of the challenges too, especially if you're starting to deal with issues like some of these open source technologies, parallel processing is a huge key to the success of these technologies, right? Parallel processing is what allows the web scale capability and durability of these solutions. And parallel processing is some pretty heavy duty stuff, and let's face it, a lot of the big organizations out there, they have the money to lure the best developers. So to your point, if you try to build this thing on your own, man, it's possible, but it's very difficult. And it's also probably not the wisest thing to do because frankly, it is difficult and if those people leave, finding someone who can come in and replace them is also difficult, right?

Kelly Stirman:

Yeah. It's challenging, number one, but number two, it's probably not the core of your business, right? If you're Facebook or Linkedin or Google or somebody like that, then data and software is the heart of your business. But if you are a bank or a manufacturer or a pharmaceutical company or a government entity, your business is not writing software. And-

Eric Kavanagh:

That's right.

Kelly Stirman:

Probably you don't want to be in the business of writing software.

Eric Kavanagh:

That's right. No, that's such a good point. And I think we are moving in an era, and we'll go to break in about 30 seconds, but we're moving into the era of specialization, it seems to me. And we'll talk about this in the second segment, but to Kelly's point here, if you're not a software development company, you really shouldn't be moving in the direction of becoming a software development company, unless that's where you wanna go because the experts are gonna be the ones to get it done right and what you really wanna do is leverage the power of the experts.

So we'll be right back, folks. You're listening to Inside Analysis.

Speaker 1:

Apps today are built on a wide range of back ends, from traditional databases like Postgres to Mongo DB and Elasticsearch to file systems like S3. When it comes to analytics, the diversity and scale of these formats makes delivering data science and BI workloads very challenging. Building data pipelines seems like a never ending job, as each new analytical job requires designing from scratch.

There's a new open source project called Dremio that is designed to simplify analytics on all these sources. It's also designed to handle some of the hard work, like scaling performance of analytical jobs. Dremio is the team behind Apache Arrow, a new standard for in-memory columnar data analytics. Arrow has been adopted across dozens of projects like pandas to improve the performance of analytical workloads on CPUs and GPUs. It's free and open source, designed for everyone from your laptop to clusters of over 1,000 nodes. Check out Dremio today at Dremio.com/insideanalysis.

Speaker 2:

Unemployment numbers are low in most places, just over 4% nationally. And if you can't find a good gig, maybe you need to go to this place because Midwest is best. Next on Dan's Life.

Speaker 4:

Find out more about Dan's Life on Facebook. Search at French and Friends. That's at French and Friends.

Speaker 3:

For many businesses, hiring is tough. You want access to highly qualified candidates fast. And you don't want to sign a long-term contract or pay upfront fees. That's why you need Indeed.com, delivering six times more hires than any other job site according to independent research. Indeed is offering new users a $50 credit to give their first job post premium visibility as a sponsored job. Redeem this offer at Indeed.com/credit. That's Indeed.com/credit. Terms, conditions, and quality standards apply.

Speaker 2:

If you're young and adventurous and willing to work really long hours, maybe North Dakota is for you. Good money to be made there in the booming oil industry. But if that's too far north, what about Iowa? The manufacturing boom there means numerous companies are having trouble finding enough workers to fill open spots. And here's even better news: Many towns and companies are so desperate for workers that they are offering paid training. Also on the upswing are Indiana and Wisconsin. But check out this shocking number, just to give you an idea of what's available. Even if they add to these jobs tomorrow, they would still need another some 180 some odd thousand workers. The catch? Most of these jobs are not in an urban setting. That's cool with me, I love small towns in the country. But if you're someone who craves the big city, then it's not for you. But maybe you're young and wayward right now, you don't know exactly what you wanna do. I say try it, just for a year or two. You might see what you've been missing and didn't even know it was so good.

Speaker 4:

Find news for you and other fine radio segments by Frenchy here on iTunes and Soundcloud. Just search Daniel French.

Speaker 2:

Maybe a change of scenery, something you've never even though of before might just be a great life experience. This is Dan's Life.

Speaker 1:

Welcome back to Inside Analysis. Here's your host, Eric Kavanagh.

Eric Kavanagh:

All right, folks, take us to the future is right. One of my favorite quotes is by a guy named William Gibson, a futurist as he calls himself, where he says, "The future is here already, it's just not even distributed." I really love that concept. Of course, the corollary's also true, the past is all around us as well. And we're really in this interesting time now where the disparity between the innovators and the laggards I think is growing almost exponentially and that's because of the scale of innovation that we see these days.

So we're talking with Kelly Stirman, he's the chief marketing officer and VP over at Dremio, a very, very interesting company that I think is emblematic of this change at this major inflection point. So what we're seeing is, it used to be kind of a world around warehousing where you had to do a whole variety of things, you had to think very carefully about what questions your analysts were going to ask because that all would drive how you modeled your data. And the data modeling is really designed for efficiency, it's designed to enable efficiency of number crunching, such that you can answer the questions that your analysts want answered quickly or as quickly as possible. You know, speed is so important in terms of doing analysis because if you run a query and sit around and wait 20 minutes, well, the thought process of a human being is just not going to endure that kind of delay.

And so now with these different approaches of building data lakes as opposed to data warehouses, you don't have to design the schema upfront. You don't have to do so much modeling upfront. You can worry about that as you pull data out in the other side. And what Dremio has done that I find very interesting is, they have recognized the reality of behavior in large organizations, things like people like to use their tools. So we talked about Tableau. There are other tools like Qlik, for example. There's a whole bunch of new tools too coming out for doing data visualization and data analysis and so forth. And the bottom line is that once you learn how to use a tool like that, you really don't wanna have to change everything once some new solution comes along.

And one of the cool things that Dremio has done, it seems to me, and I'll throw this over to you, Kelly, is recognize that reality of human behavior and cater to it, right? So you have designed a solution that enables people to keep using the tools they want, just have the access to more data and have access that's faster, right?

Kelly Stirman:

Yeah, it's a lesson that we've seen in many types of technology over the years, is if you ignore basic human behavior, you may not have much adoption in your technology. And the reality is people have their jobs and tools that they use to do their job and over time you become more comfortable with those tools and more skilled, and the idea of completely changing those tools is a pretty big undertaking on a person by person basis.

And so if you went to a company and said, "Hey, I have this great new way of doing something but it means all 10,000 of your employees need to stop using Product X and use Product Y instead." That's a big undertaking for most companies.

Eric Kavanagh:

Yeah.

Kelly Stirman:

And so ... You mentioned some of those tools that, what we call data consumers use every day, that's everything from something like Microsoft Excel, which until the past few years was the most installed application on the planet. I think about three years Facebook passed them. But over a billion people have access to Microsoft Excel. But tools like Tableau and PowerBI and Qlik and traditional BI tools like MicroStrategy and Cognos and Business Objects and many, many other technologies out there that are in the workflow of people every day.

And what our strategy with Dremio is to simplify and accelerate how those data consumers use their tools to get access to the data that they need to do their jobs. And traditionally, the only option that those data consumers have had is to go to IT and ask IT to do that work for them. And as I mentioned briefly before, we're making it so they can do those things for themselves without being so dependent on IT to do things for them. And that's important. Once upon a time, when you went to a building and got in an elevator, you couldn't push the buttons. There was somebody who operated the elevator for you and hopefully they weren't on a smoking break. And that's a little bit what it's like today when it comes to data and IT. And now you sort of take it for granted that you can just get on the elevator and push whatever button you like and go there directly under your own control. And we want something similar when it comes to data.

Eric Kavanagh:

Yeah, that's a really analogy. And the cool thing too is that if you have to ask someone for access to this data set or that data set, if you have to go to IT or talk to your development team, obviously that happens all the time. It certainly happens with dev ops, right, with development operations which has really fundamentally transformed how technology and software can be designed for the operations of businesses, a lot of cool stuff happening in that realm these days for dev ops.

But for the average business analyst, you wanna be able to solve your problems yourself. If you're trying to understand some aspect of your business, whether it's how many widgets you're selling in this region or even something, let's just focus on some different area, like streamlining business processes. If you're really trying to understand what's going on, you wanna be able to have an interaction with your data and play with different ideas, test hypotheses and you want that to be a real fluid process. And if you have a self-service environment that is enabled by these kinds of technologies, you can do that, whereas if you don't have that, you're just gonna do other stuff, right?

Kelly Stirman:

Yeah, you're gonna move onto the next thing that you can do at the speed of thought. People have very short attention spans. And that threshold at which people stop being able to hold onto their thought and continue down some line of reasoning, that threshold is measured in typically in seconds. It's certainly not tens of minutes. Maybe you're okay for a minute or two, but mostly we expect things to be in seconds.

And part of how we got to be this way, frankly, is in our personal lives when we are away from work and away from the office, if you have a question about the world, you can ask Google and get back an answer instantly. And if you need to solve some problem, there's probably an app that lets you do it in a couple of clicks on your smartphone. And then you get to work and you're expecting the same kind of experience. But of course, it's not like that, right?

And so the part of the ... The challenge here is this disparity in expectations between what we experience outside of work and when we get back into the office and how do we make that experience in the office more like what we have in our personal lives.

Eric Kavanagh:

Yeah, isn't that funny that consumer technology in many cases is driving enterprise technology these days, right? I think that you made an excellent point that people have expectations now because of what they get from Google, because of what they get from other consumer applications, and that's now really put the pressure on enterprise software companies to ratchet up the attention and ratchet up the heat, right?

Kelly Stirman:

Absolutely. I think every IT department feels this acutely because their ... The employees in the company that they work for are the ones complaining and expressing their frustration. And so IT departments are looking for software alternatives that help them make that experience better for data consumers within the company.

And I have to just personally, from my 20 plus years of working in software, that there is incredible interest in what we're doing with Dremio because it removes some of that pressure from IT and lets the data consumer feel more empowered. So it's good for both parties and it preserves the governance and security controls that are so essential today. We're not suggesting that we should work around those controls. In fact, we need to make them so seamless that nobody tries to take matters into their own hands and work around them. And that's another key part of the strategy here is seamless integration with controls that are so easy and nice in terms of the experience that you never think about wanting to go work around them.

Eric Kavanagh:

Right. And I'd like to get back into this age of specialization, too, right. If we're in the information economy now, and that means a lot of different things, but I think part of what's happening is that we're really in the age of collaboration and the age of execution. And what I mean by that is, if you can do, especially any data-related job, if you could do any data-related job, any programming job from anywhere, that means you're no longer just competing with people in your geography. You're competing with people all around the world. And especially in a country like the United States where, let's face it, the economy's pretty strong, there's a lot of pressure. This is why you see so much outsourcing, that's why you see so much development being done in places like India and other places around the world.

And my point is that you have to really specialize and know what it is that you do well and focus on that and then leverage all the additional help you can get from software. And of course, open source is great stuff too, but it's ... I think the ideal scenario is to get a package that leverages open source and that enables you to very rapidly get to work and do something with your data, right?

Kelly Stirman:

Yeah, I think that one of the advantages of open source is that you are benefiting from the work of many, many others in a way that's much more expansive than you could ever be on your own. And that's in every area from just usability of the product to security of the product to reliability. When we were thinking about how to bring this idea to market, we never gave a second consideration to whether this is something that should be open source. It allowed us to get to market more quickly because we could take existing building blocks that were already in open source and use them in Dremio. But it also allowed us to take all the great things we've added in Dremio and make that something that everyone could take advantage of and we all benefit from that.

Eric Kavanagh:

Mm-hmm (affirmative). Yeah, that whole standing on the shoulders of giants is such a huge deal these days, right? And I think that's why the open source movement is so compelling. I was just thinking to myself, you had told me before the show, you guys just opened a new office in Austin, Texas. Congratulations on that. I'm a big fan of Austin, I lived there on and off for many years. And one of my good buddies in the industry, Mike Hoskins, is the chief technology officer for a company called Actian that's based out of Austin. And I remember him giving a briefing to myself and Dr. Robin Bloor a few years ago and he said that they realized a few years back that the days of building your own integrated development environment and shopping it out and selling it to customers are over.

But his point was that open source has fundamentally transformed the whole process and the whole mindset of creating software. And I think for the better, right? Because now you have these de facto standards out there for how you can operate and because there's so much attention, because you have such a strong ecosystem around these open source communities now, that means you know ... To a certain extent, you're future proofing your technology, right?

Kelly Stirman:

Yeah, I think that's exactly right. Future proofing is a good word. Even when there's not an official standard, there may be implicit standards. That consensus is a kind of standard as well and open source is where you find the consensus because it's a reflection of what a broad group of users want and thinks is best. It's very democratic in that sense.

Eric Kavanagh:

Yup. Yeah, it's amazing. I'm just fascinated by how much development of software has changed and how many more developers there are. Developers used to be pretty scarce, now I have to think that today there are probably 10 times as many developers working jobs, gainfully employed than there were, let's say, 10 years ago. What do you think?

Kelly Stirman:

I don't know the exact number. It has grown massively. My kids are learning Javascript and Python in elementary school and I certainly wasn't when I was a kid. It's irresistible, right? If you were entering the workforce now and had thought about maybe being a theater major or medieval literature and you just look at the job opportunities of those worthy endeavors compared to what's out there for a software engineer, I mean it's just staggering how much more there is available if you have those skills. And as you mentioned before, it doesn't matter whether you come from one place on the planet or another, there's such a demand that the world is your oyster if you have those skills. And it's certainly something, you know, for my kids, I've encouraged them to consider as they think about what they wanna do with their lives as a great way to make a living because there's just so much out there and I don't see it stopping.

In terms of numbers, when I was at MongoDB for four plus years just a few years ago, the primary user of that technology is a software engineer and the estimates that I saw were between 10 and 20 million software engineers worldwide.

Eric Kavanagh:

Wow.

Kelly Stirman:

Which is a huge number. But from one thing to step back and consider now that I'm at Dremio and I think about not just software engineers but the greater number of data consumers, there are at least 10 times as many data consumers in the world. And these are people who, because they're not software engineers, they can't simply go into the systems and infrastructure that software engineers use to get the data that they need to do their jobs. This is back to them waiting in line, hoping to have one of the one tenth as many software engineers who can help them get what they need to do their job. And that's the kind of ... That ratio you see between the number of consumers and the number of people who can help them is like being at the deli counter. The number of people who are waiting for their number to be called versus the number that are behind the counter-

Eric Kavanagh:

That's right.

Kelly Stirman:

Trying to fulfill those orders.

Eric Kavanagh:

That's exactly right. We really have the optimize and balance to be able to handle all these needs. All right, folks, we're gonna be right back. You're listening to Inside Analysis, don't touch that dial. Stand by, we're talking about the information economy.

Speaker 1:

Do you have a great idea for a radio show but have no idea where to start? Or have you been hosting a podcast for awhile and wanna take it to the next level? If so, you need the GAB Radio Network. To host a show on the GAB Radio Network, all you need is your voice and we'll handle the rest, from technical engineering to full service audio production and much more. Every show on the GAB Radio Network can be heard on our station on the TuneIn Radio app, plus we put all our shows on our satellite, which is accessed by 5,500 stations. And here's the best part, you can host from anywhere you want. There are many means to connect to the GAB Radio Network remotely and our staff of highly trained engineers and producers will make you sound like you're here in studio.

So if you wanna be on the GAB Radio Network, the same network that hosts the Small Business Advocate, Radio MD and Talkin' Pets, send an email right now to sales@gabradionetwork.com. That's sales@gabradionetwork.com.

If you run a large corporation, small business, or anything in between, you need ads to help get the word out. A full page in the newspaper sounds good. A TV spots sounds even better. But let's face it, newspapers are essentially last minute wrapping paper and a TV spot is just expensive and basically code for bathroom break. Talk radio is different. Commercials cost practically nothing to produce and the listeners are loyal. They like what they like and they stay tuned in. When they hear about a new product or service during their favorite show, they can't wait to try it out for themselves so they can talk about it with their friends. And you know how radio listeners like to talk. If you wanna add radio to your marketing portfolio, you need the GAB Radio Network.

GAB Radio is the team of full-service experts you've been looking for, from writing to production, distribution, voiceover and more. We make sure your spots are paired with the right shows and the right markets at the right time of day so the right people can hear. Since we're in over 100 markets across 34 states, Canada and American Samoa, I'd say it's a pretty good place to start.

If you wanna know more, just email sales@gabradionetwork.com. That's sales @gabradionetwork.com.

Welcome back to Inside Analysis. Here's your host, Eric Kavanagh.

Eric Kavanagh:

Alrighty, folks, back here on Inside Analysis, talking all about the information economy with Kelly Stirman of a company called Dremio. You should look them up online. You guys have a pretty cool logo too. Tell us about the logo.

Kelly Stirman:

Sure. So, the logo is ... Of course, the name Dremio but a friendly narwhal to the side that you might've mistaken for a unicorn, dolphin, or a swordfish. I get all kinds of things. I think narwhal is one of those creatures not everyone is familiar with.

Eric Kavanagh:

Yeah, what was the inspiration for that?

Kelly Stirman:

Well, a narwhal is the only true unicorn in nature.

Eric Kavanagh:

Oh cool.

Kelly Stirman:

So we thought that made a lot of sense, but also narwhals are ... They're mammals so they're cousins in a way and they're not endangered, so we don't have to worry about our logo going out of business, so to speak, before we do.

Eric Kavanagh:

That's pretty funny.

Kelly Stirman:

So we ... And probably the most important of all is that it wasn't taken by somebody else.

Eric Kavanagh:

Right. No, that's important.

Kelly Stirman:

I love the narwhal, it's a great animal that I didn't really know that much about but now I guess I'm a ... After lecturing my kindergartner's class on the history of the narwhal, I feel like a little bit of an expert, I suppose.

Eric Kavanagh:

Good for you. And I'd like to also kinda dig into this open source topic again for a whole variety of reasons, one of which is because it really has opened up whole new possibilities about how to design software. And the commitment that you get from large organizations building out these technologies, like we talked about standing on the shoulders of giants. Of course, I first studied the Apache Software Foundation back in, I guess it was 2005, in late 2005. I had just moved up north and there was of course this big storm down here in New Orleans called Katrina that many of you may recall. And I was doing a bunch of research into open source, primarily because I had an ax to grind. I realized that Louisiana has its history of chicanery in the world of politics and the senators down here had asked for a quarter of a trillion dollars, $250 billion to rebuild. And I thought to myself, "We have got to get transparency into that spending."

And so I kind of went on a bit of a soapbox mission. And part and parcel to that, I was researching open source software. So, of course, you have the Apache Software Foundation, their first project was the Apache web server, which by 2005 was over 50% I think of websites were hosted on the Apache web server. And just I thought, "What a wonderful movement this is and how can this not change how business is done?" And now of course, there's this whole range of open source projects and one of the newer ones is Apache Arrow. And that's something that Dremio's involved with. Can you tell our audience what Apache Arrow is all about and what it does?

Kelly Stirman:

Yeah, it's amazing how open source has flourished and virtually every layer of the technology stack now has not only an open source alternative to traditional proprietary offering, but in many cases, the open source alternative is the standard or is best in class. And as I said, it's virtually every layer of the technology stack. But there are still areas that represent opportunities for new kinds of open source projects and Apache Arrow is a good example.

So to understand its role, you have to understand a little bit about the nature of computing, which is that just as if you were organizing your pantry at home or your garage or closet, sure, you could just dump everything in there, but if you take time to carefully organize things, you can optimize how you access things that are stored there. And the same is true for data and so for many years, you had an optimal way of storing data on disk for analytics and that's something you mentioned earlier called columnar data structures. And Teradata is an example of a product that does this. Vertica is another example and in open source, you have columnar ways of storing data on disk.

Before Arrow, there was no standard for organizing data in a columnar representation for in-memory computing. And so Arrow, first of all, there's really two things, but first of all is a way to organize data in a columnar format for in-memory computing, which represents huge performance advantages. But number two, and probably more importantly, Arrow is a standard shared across many different projects. And what that means is that everyone can share the same data structures in memory instead of having to, what's called serialize and de-serialize the data and make copies of the data, which is very, very resource intensive and can be the most dominant factor in terms of how long it takes to do data processing in memory.

And let me just use an analogy to try and explain this. Once upon a time, when you went to Europe on vacation as an American, you might have this plan to do five countries in seven days or something like that. And you would go to Europe and when you went from Italy to France and France to Switzerland and Switzerland to Germany, et cetera, et cetera, at each border, you would need to plan for maybe a half hour or maybe several hours waiting in line for passport control and then going and exchanging your money and knowing that you were going to lose some money in the conversion.

And that really is what it's like without Arrow to do in-memory computing because one process wants to access the data and they hand it off to another process and there's a whole bunch of overhead to do that. Well, Arrow is like going to Europe with the euro. There is no border control, there is no conversion of currencies. You can just fly over the border unencumbered and focus on what you wanna do.

So Arrow is these two things: columnar for efficiency and speed and a standard so that you can be much, much more efficient between processes that are accessing the same data. And then it ends up being incredibly important for machine learning and artificial intelligence, where processing increasingly is moving to GPUs [crosstalk 00:48:26] where there is much, much less memory to use and the benefits of Arrow really, really stand out.

Eric Kavanagh:

Yeah, that's a really good point. Machine learning is taking off these days. In fact, we have a whole program that we're running called "AI is the new BI," and it simply means artificial intelligence is pervasive these days. And to your point, those algorithms really love parallel architectures, right? To get those algorithms humming and doing their thing and getting value from them, you need to be embracing that kind of architecture, right?

Kelly Stirman:

Absolutely. There are many types of computing problems that are amenable to a divide and conquer approach, where you can parallelize things and solve the problem more efficiency by throwing resources at it. And a GPU is like a CPU with 1,000 times or 10,000 times as many cores. And so you can take a problem and put it on a GPU and solve the problem much, much more quickly. The challenge though is that there's much less RAM on the GPU and that's why Arrow is so useful, is because it takes up less space and allows different processes to access the same buffers on GPU RAM without going back to main memory in the system.

Eric Kavanagh:

Right. And what you're talking about again is optimization, right? You're talking about optimizing the data supply chain essentially such that more people can do more powerful analysis of their data. And the bottom line is that's what you need these days to stay up with the competition. If you're in the age of specialization, you need to really understand your specific value-add, what it is that you're bringing to the economy and you need to focus on that and understand it and there's competition everywhere these days, right? I think that's part of what is driving so much innovation, is the fact that enough people recognize the competition is everywhere. In some cases, it's in the form of a giant company called Amazon, which of course now is even in the grocery business. So, that pressure is what is forcing so much innovation to occur in data pipelines in the information economy, right?

Kelly Stirman:

Yes. And the need for speed, right? Whether you're doing fraud detection or self-driving cars or any number of these uses that artificial intelligence is solving, we still care a lot about time. And the faster you can solve these problems, the more benefit you can drive to people who stand to have their lives changed and benefited from these kinds of processing.

Eric Kavanagh:

Yeah, this is just amazing stuff. Well, folks, we're talking to Kelly Stirman of Dremio. Maybe in our last 90 seconds or so, you guys have some pretty nice clients already. You wanna talk about some of your top tier clients working with you?

Kelly Stirman:

Yeah. I think, so Dremio has two editions, has a community edition that's free and anyone can use and put into production and that's available for download from our website. And then we have an enterprise edition that's designed for the demands of large Fortune 1000 companies, where you have capabilities around security and governance and administration that are not in the open source version.

And so we have companies that are large multinational banks, large manufacturers, large government entities who are using Dremio in production to make their data consumers and data engineers more productive. So companies like Intel and TransUnion and Daimler Mercedes Benz and a number of other companies that you would recognize. And many others that I of course can't mention, but it seems to be applicable to different geographies, different company sizes-

Eric Kavanagh:

That's great.

Kelly Stirman:

Different industries.

Eric Kavanagh:

That's great stuff. Yes indeed, folks. Well, we burned through an hour here. Hop online to Dremio.com to learn more about them. We'll be back next week, folks. Thanks again, You've been listening to Inside Analysis. Take care, bye bye.