Subsurface Summer 2020
Welcome with Billy Bosworth
Expand your technical knowledge and hear from your peers and industry experts about cloud data lake use cases and architectures at Subsurface™, where we explore what’s below the surface of the data lake. Hear firsthand from open source and technology leaders at companies about their experiences spearheading open source projects and building modern data lakes. Explore real-world use cases, from data warehousing and BI to data science and advanced analytics.
Billy Bosworth, CEO, Dremio
Billy joined Dremio as CEO in February 2020 after serving on its board of directors since June 2019. Prior to Dremio, he served on the board of directors at Tableau Software (NYSE:DATA) for nearly five years, through their acquisition by Salesforce.com (NYSE:CRM) — one of the biggest software acquisitions in history. Billy also served as CEO of DataStax from 2011-2019, propelling the company’s growth from sub-$1M in revenue and 25 employees to more than $100 million in revenue. He is a frequent speaker on the topics of data autonomy, hybrid cloud, AI, and distributed workforces and was an active member of the Forbes Technology Council for many years.
Billy’s 25+ years of tech experience span nearly every part of the data industry in roles ranging from database administrator and software engineer, to book author, to general manager and CEO. He holds a Bachelor of Science in computer science from the University of Louisville where he was also a scholarship football player for the Cardinals.
Good morning, everybody. Afternoon, or good evening, wherever you're watching from in the world today. Welcome to Subsurface, the first ever cloud data Lake conference. And we are privileged to bring this to you today, live from Santa Clara, and we enjoyed putting this event together. It's been a really amazing journey and we're here for one reason. We're here to take a walk out into this world of data Lakes, which is getting a lot of attention, and a lot of press and a lot of hope, but there's a little mystique about what's going on underneath the surface of the data Lake. There's so many new things happening and emerging technologies and new innovations, and to really leverage the best that this world has to offer, we do have to go deep. At Dremio, we decided, let's get people together to talk about this. Let's bring together experts and enthusiasts and implementers and innovators.
Let's bring them all in one place and have this discussion. Well, of course, with COVID you can't bring them to one place physically, so we're doing that virtually. When we began this idea, about six weeks ago, we said, "Well, how many people do we think we could get to come?" We thought, "Well, if we got 500 people that were really interested in this topic, that would be a great conversation." But as soon as we started, we got a few days in, and we realized we're going to go pretty far past 500, so let's set maybe 1000. Then we started sending out some early invitations and we blew through 1000. So, we said, "Well, maybe we could get 2000." Within just a few weeks, we were well over 2000. This morning, sitting here today, just five weeks of notice for a first time-ever conference, we have over 5,000 people registered for this event.
That really speaks to the interest in what's going on here. There's so much exciting stuff happening, and we're proud to bring you a deeper look at it. Now, part of the reason why we have a conference instead of just publishing white papers and doing this in piecemeal, is because we want people to be connected. If you've registered for the event, you've gotten a lot of different emails by now, hopefully on how to register for our Slack community. This is important, because here you can learn and connect and grow and really share with each other in good, meaningful ways. Also, as I mentioned, today's event is live, so there's a pretty good chance we're going to have some happy accidents that occur. If you're a Bob Ross painting fan, that you might see kids wandering into some backgrounds here and there, some noises that we didn't expect. Go ahead and share it.
Let's have some fun today. If you want to share on social media, @subsurfaceconf is our Twitter handle and hashtag subsurface, for all your social media channels. Let's have some fun and stay interactive and let people know what's going on here. Now, what we're going to do today is really interesting, because we're going to get to travel through time a little bit. If you're not a fan of the Dark series, I'm sorry, you should be. This is a time machine from that series. Really, what we're going to look at is, where we've been, in terms of this idea of data analytics around data Lakes, where we are today, and then where we're going in the very, very near future. Now, to do that, we've got some great people taking us on this journey. We've got creators and we've got innovators and we've got early adopters and we've got people who know best practices.
This list is really a powerful and amazing list of people who are going to give us detailed insight into all the things that make this world so exciting. At the end of the day, when you lump it all together and look at well, what's in common here? It's really about an open cloud data Lake architecture. When done properly, this advantage to you in many different ways, when you think about your analytics environment. First, we're going to get all the innovation and the power and the standards that are around open source technology, being able to innovate quickly, and then share this with a broad group of people, and have lots of people involved in community efforts. This is really fundamentally core to this architectural design.
It unlocks the data in really powerful and unique ways, because what's really important in this model, is that the data itself gets unlocked and is free to be accessed by many different technologies, which means you can choose best of breed. No longer are you forced into one solution that may do one thing really well, but the rest is average or sub-par. This model allows you to implement best of breed technologies at your pace. When you model this correctly, you can insert different services and different layers, at different times. Now, why this is possible today and why this is so exciting, is because we can see now the separation of compute and data and storage. We've heard a lot in the industry, in the past several years, about the separation of compute and storage, and that's important.
That is a very important part of what we're doing. But separating the data, architecturally speaking, is going to be critical to your capabilities. Tomer, in just a few minutes, is going to come up and tell us more about exactly what that looks like from an architectural perspective. But when you do that well, that's going to give you the capabilities of having some great performance and really good scale. Because you have this infinite pool of resource available to you in a pay-as-you-go model, in an on- demand model, in an elastic model in your compute resources and your storage, now you can achieve performance and scale that maybe you could never have achieved in the past. Finally, we're not just talking about rudimentary basics here.
We're not just talking about common flat file formats that are going to sit on the basic storage. We're talking about being able to do things that you would expect from a normal database, like inserting rows and mutating data, and being able to do a transaction against a table. These are all things that are made possible by this architecture and by these technologies. When you look at this holistically, this is pretty exciting. This is taking us into an era of capability and possibility that we have never been in before, but getting this architecture right is going to be fundamentally critical to everything that is going to happen thereafter.
So, that's why we're excited today. That's why we're excited to take a look under the surface of this amazing world of data Lakes, and really start to understand fundamentally, how to get the best out of today's technology, how to get the best out of the cloud infrastructures, how to get the best out of open-source projects, how to get the best out of data analytics and combine these things in such a way that the sum is greater than the individual parts. That's what we're shooting for. We're glad you're here with us for this journey. We think it's going to be a wonderful day. Let's take a walk out on the pier. Put your swim suit on and let's dive in. With that, let's move on and hear from Tomer.