How Dremio and Tableau enable cloud data lake analytics at InCrowd Sports

In this webinar we explore how to accelerate query performance for BI and Data Science workloads; Visualize data directly on the data lake; Secure and govern the data lake; Leverage multiple data sources and bring them to life with Tableau and create a self-service semantic layer on the data lake.

Dremio Jekyll

How Dremio and Tableau enable cloud data lake analytics at InCrowd Sports

Transcript

Jason Nadeau Hello everybody, this is Jason Nadeau with Dremio, and we are excited to have you with us today to hear about how Dremio and Tableau are working together to enable Cloud Data Analytics at InCrowd. We're going to start the webinar officially in about one minute to give folks some time to connect in. So we'll put on a little bit of Muzak here to entertain you while we wait to get started officially. Thanks.
Jason NadeauAll right, after that little musical interlude I think we're ready to go. So welcome everybody to our webinar today. We are going to talk with Tableau, we're going to talk with InCrowd about how we're enabling Cloud Data Analytics. And so today, I am excited ... Hold on a second here. Trying to get to the next slide. Let me get rid of this. Excited to have a couple speakers with us. So, my name is Jason Nadeau, I'm the VP of Strategy at Dremio. I also lead the marketing team here. And we've got a couple great guests that are going to kind of, you know, talk with us together about how we're really delivering a whole new level of analytics on data lake storage.
Jason NadeauSo first, we're excited to have Conor Knowles, he's a senior product marketing specialist with Tableau. Welcome Conor.
Conor KnowlesYeah, thanks Jason, I appreciate you having me on.
Jason NadeauYeah, you bet. Great to have you here. And, you know, our star of the show really is Ciaran Fisher. He is the CTO of InCrowd Sports. Welcome Ciaran.
Ciaran FisherHi Jason, thank you very much for having me.
Jason NadeauYeah, you bet. And Ciaran's coming in all the way from the UK here too, so we've got people all from different time zones, so great to have everybody together.
Jason NadeauSo a little bit of logistics. In terms of asking questions, the way to do this is to use the Q&A feature in the Zoom Webinar panel. So use that please. The chat is not nearly as effective, the Q&A allows us to actually understand and manage the questions. So please use the Q&A feature and ask your questions through that function as we go. And what we'll do is, you know, when we get to the end of the webinar, we'll start going through all the questions, and if there's any we can answer, of course we'll followup with for folks once the webinar's done.
Jason NadeauSo let's get started. So we're going to do this in just three sections. So you know, up first, a bit on Dremio, our point of view, and we'll have Tableau do likewise. And then we'll really get into the meat, which is what Ciaran and InCrowd are getting up to with both Dremio and Tableau.
Jason NadeauSo our perspective here at Dremio is that really, the analytic stock in general, today, is built on some workarounds that have been around for, in many cases, decades, multiple decades. And you know, those workarounds have existed to improve performance of analytics but they came with a lot of costs around complexity, restricting data scope and flexibility. So if you look at this picture, it's not a pretty picture. And you know, at the center of it, really that first workaround was people trying to consolidate data, aggregate it together in expensive and proprietary data warehouses, data marts, and using a bunch of complex virtualization to the try to connect all sorts of disparate data sources together.
Jason NadeauBut they didn't stop there. In order to get the data in, there is a lot of ETL or ELT, it's very brittle, it's very complicated. You know, for data engineers in particular, managing this and the change that continues to occur with all this, it's just really, really challenging. But even with that, the performance was ... Generally speaking, it's not enough for the data consumers that really want to get access to and explore that data. And so what you find is a bunch of other external acceleration technologies brought in. Cubes, extracts, aggregation tables. You know, they're copying the data and trying to make it faster. One of the other things you notice though of course, as you move up that pyramid, this triangle in the background, right? The amount of data that you actually get to work with if you're a data consumer is decreasing as you go. And you know, that's a byproduct of these different techniques and accelerations and whatnot. And that means that if you want to bring in some new data, you want to try some different analytics, you got to go back to the beginning. Grab another data set and go through the whole ETL thing, you know, create more cubes and all the rest. It's really ... Really slows things down.
Jason NadeauAnd so [inaudible] trying to summarize the problems that we see, depending on who those data consumers are, right? If you're a data analyst or a scientist, you're spending a lot of time waiting for IT and you're having a lot of difficulty finding, getting access to the right data. And the amount of analytics you could actually do becomes constrained, right? And you know, it's certainly not performant when you're actually doing it. If you're a data engineer and architect, this is not a pleasant experience either. You're spending a lot of your time building those cubes, it's very difficult to modernize the data infrastructure as well because of all those connections and brittleness and copies of data. It's easy to lose track of data and that also leads you to do the difficulty in governing the data that you have and securing the different copies and whatnot.
Jason NadeauSo it's a real problem, and you know the view that we have as a company and why we really exist is we can eliminate the workarounds with a better architecture. And so that's what Dremio really is. We call it the data lake engine. It's a new category. And it's all about a new ... An open data lake architecture that's purpose-built for interactive analytics. So now you see here a very simplified architecture connecting those different data consumers up at the top directly through to data lake storage. Not only data lake storage, I'll say that. Like you see other sources around but data lake storage is really becoming the primary location where people are putting their data.
Jason NadeauAnd so if I was just to sort of summarize like, "What is Dremio?" We are an accelerated data lake query engine with a rich semantic layer. And so with that, we can unleash the power of your data lake. And if you ... People, absolutely you should go look around and see what other sorts of technologies exist and around the data lake, and what you'll find is only Dremio delivers live interactivity on that data. Like really fast performance. We're also very efficient in the compute that we deliver, which lowers your cost, lowers your cloud cost. And we have a governed self-service model, so people can do this on their own, it doesn't take a lot of time and deep IT involvement and it's governed, right? People can have control of their data.
Jason NadeauAnd so what do we see as outcomes? No more cubes, no more extracts, no more data warehouses really. Right? Like, I mean, those things will continue to exist for probably a long time, but that no longer is the future and you can start to modernize away from that. Put the new workloads directly on the data lake. And what do customers get? 100 times faster time-to-insight. Really driven by the performance, but also the fact that we can help connect people to these data sources without having to go through all these other workarounds. And literally the IT time and people time of building and reconnecting all of this stuff. And so if you can do 100 times faster time-to-insight, from beginning to when you actually get your answers, that means you can do 100 times more analytics as well, and that's really powerful, because that's probably where you're going to start finding those eureka moments and those, you know, really the golden nuggets when you can spend the rich time that you need to do the analytics that you want.
Jason NadeauAnd then efficiency, 10X more efficient than other technologies that exist out there, and that's really important because cloud is not cheap, and organizations are buying that up more and more as they get more heavily adopted in the cloud. And last but not least, this is a really, really simple, easy to understand, easy deploy and use architecture that's completely open. Not proprietary, it's not the traditional data warehouse way.
Jason NadeauSo that's the sort of what Dremio is. Now how do customers use us? Really three big use cases with a bunch of sub-use cases inside. So I'm not going to talk about each one of these but just to give you a flavor. So business intelligence on data lake storage, and this is really what we're going to talk about here today. Data science on data lake storage. Another use case that people spend a fair bit of time with Dremio. And data lake modernization. You know, absolutely people want to figure out how they can move to a data lake centric world. Yes, they've got either [inaudible ] data lakes with the DoOP or they've got existing data warehouses and we can help them modernize and augment as well.
Jason NadeauSo for today, and apologies for the hanging T here on the left hand side, not sure how that happened, we're really going to focus on business intelligence on data lake storage. You know, and this is what Tableau and InCrowd are really driving, so we're going to get into a lot of this, but do you think about accelerating dashboards? Do you think about ad hoc type queries as well? And of course behind the scenes, the data engineering that goes into this and standardizing on various types of semantics and governance as well.
Jason NadeauSo last slide, from our perspective here, is that there's a lot of customers that are using ... Joint customers that are using Dremio and Tableau together and that's super exciting. And you can see, here's just a sampling of some of those customers. And you can also see, just from a quick glance, that it covers a really wide range of industries as well. And that's also, you know, not surprising for people that are in the business intelligence world. I mean everybody is trying to get more access and more insight out of the data that they've got. It's obviously a very competitive world out there, and so we're in any event really excited and happy to be working with Tableau to make these customers successful in these data analytics initiatives.
Jason NadeauSo with that, I am going to pass it over to Conor Knowles. So excited to have Conor kind of run us through what's going on from a Tableau perspective. So with that, over to you Conor. Thank you.
Conor KnowlesPerfect. Thank you Jason. Real quick, can you hear me okay Jason?
Jason NadeauSure can. And by the way, as you go, I'll keep control, so just let me know when you want me to advance slides and I'll click you along.
Conor KnowlesPerfect, sounds great, thank you. So hey everybody, this is Conor Knowles here. Really appreciate you all joining us today, taking time out of your days. I have the honor of talking to you about Tableau in this three-pronged approach today. So I work in product marketing at Tableau, and I have been here a little over four years. And what I'm going to do for a couple minutes before we turn it over to Ciaran, is I'm going to give you some quick insight into Tableau, whether you're familiar with us or maybe we're entirely new to you.
Conor KnowlesOur message has always been the same, since we were founded in 2003, and that is we help people see and understand data. And we do that through our analytics platform. It's an awesome interface where you're literally dragging and dropping your data to create compelling visualizations that tell a story of your data. Now, this mission statement primarily shows our focus at Tableau and that's people, right? We help people see and understand data. You notice we're not calling out data scientists, students, analysts, doctors. We help all of them, we help everyone and anyone, and we really believe that people are your greatest asset.
Conor KnowlesNow I actually might be a pretty good example of this. Before Tableau, I worked in college football, or American football depending on where you might be tuning in from. I didn't have any discernible tech or computer skills, and when I started at Tableau I became an expert in all things Tableau. Actually before product marketing, I was in our customer consulting org, and I was the go-to resource for helping our customers use, learn, and adopt Tableau. So they're coming to this person that didn't think they could ever learn a software to learn Tableau, and I'm an expert in all things Tableau.
Conor KnowlesSo I've always thought my experience speaks well to Tableau, which is that anybody can use this, right? Regardless of skill set, regardless of background, and it's vital that people at your company, regardless of how big or small you might be, have access to analyzing data. And that's one of the things that Dremio really helps Tableau with, is getting good high performance access to that data. And I'm also really excited to see what Ciaran and the team at InCrowd has done with their use of Tableau and Dremio together. Because it's clear that they can see and understand their data now, and I'm also selfishly really happy to see a great use case in the world of sports, due to my background. We can go to the next slide there, Jason.
Conor KnowlesSo when we think about Tableau and what we deliver, we started out as a desktop analytics tool, but we're a lot more than that now. So we offer the ability to create your visualizations and share them wherever, however, and with whomever you like, whether that's deployed on premises or in the public cloud or managed entirely by Tableau. Now you partner that with Tableau embedded, Tableau mobile, Tableau prep, we really pride ourselves in being able to offer you the breadth and depth of capabilities that you need for that end-to-end analytics experience. And we can look at the next slide here.
Conor KnowlesSo we ... On the next slide there, Jason. Thank you. We deliver an enterprise analytics platform that will empower and elevate your people. It'll increase the value they provide to your organization. And we also deliver a platform that meets the tough security, governance, and scalability requirements that an organization like yours might require. And these are the four things that we really believe Tableau does best. Now I'm going to focus on one of these as I kind of lead into the Tableau and Dremio story. So on the next slide Jason, you'll see ... What I want to do here is focus on one significant area, and that is leveraging technology.
Conor KnowlesSo I work specifically on Tableau's technology partner team. Now Tableau has always maintained that it is vital for us as a platform to adapt to your environment. You, the customer, your environment and needs with unmatched flexibility and choice. So we help unlock and extend the value that you've already invested in your data infrastructure, as we want to work seamlessly within your investments that you have today and with what you'll be changing to tomorrow. So that's why technology partners like Dremio are so important and so vital to Tableau. We'll got to the next slide here.
Conor KnowlesSo, Tableau, we excel in all things analytics. But we need to make sure that we integrate with best in class partners. Especially in areas that they excel in to make it easy for organizations like InCrowd to perform their analyses, and for their interaction with our technologies to be as seamless as possible, right? So performance issues with your data? Okay, great, we at Tableau want to be able to work and integrate with best in class experts like Dremio to help solve that pain point for you. We're not trying to reinvent the wheel, let's work with who already does it best. So I really love messaging to people, "Hey, here's how we work with other technologies and how those partners can work with Tableau to help you see and understand your data." So we'll go to the next slide there.
Conor KnowlesAnd what I want to get a little more into is kind of why we're here today. So at Tableau, we're a really flexible analytics platform. We fit with what you have, right? When it comes to connecting to data, you can connect to your data easily. Maybe it's flat files, databases, big data, cloud-based data, or application data. Now, if we focus in on that, that could be a slippery slope, especially just getting inundated with data, right? And Dremio solves this pain point of having so much data everywhere, around you, you probably don't even know what to do with it, right? Understanding what to do with it to ensure that you're maximizing the potential of your data, and we'll touch more on that in a little.
Conor KnowlesTableau is also ... It's integrated into your environment. We're seeing customers who're starting to build on top of Tableau, so they can do this by building workflows and processes around Tableau, and by using our APIs to build on top of Tableau. And you'll see a really great example of this when Ciaran shows Tableau embedded into InCrowd's webpage. And we'll go to the next slide here.
Conor KnowlesSo if we get a little into I guess kind of why we're here, what the problem is, what are we solving, one of the most common situations and problems that I faced in costumers consulting was customers talking about performance when using and maintaining all of these separate data warehouses, storages, and data lakes. Now data is exploding, and it's doing so at an exponential rate, so it's very common for data to be across many separate data silos and that can make it hard to connect to your data easily and without any lapse in performance, right? So customers, they need quick and easy access to their data, and with data in so many different places it's easy to get confused.
Conor KnowlesNow, when Tableau connects real time to your data, it's only as fast as the database or warehouse. And quite frankly, some data lakes aren't great for querying and asking questions, they're better for just storage. So what happens when you have all these questions in Tableau? Well, enter Dremio, who we can go to the next slide, look at the solution here. So with Dremio, that problem is negated. The ability to connect Tableau directly to Dremio, this saves Tableau users so much time by finding all their data in one location, not to mention the ability to see a rapid improvement in performance. So, this means more efficiency for you, it means more analysis in Tableau, and plan and simple it just takes away a lot of frustration that comes when dealing with massive amounts of data across separate locations. So that's the pain point that we're solving, and we'll go to the next slide before we hand over to Ciaran to see what InCrowd is ... How they're using us together.
Conor KnowlesI am really excited to see those examples. You'll notice some of the dashboards that Ciaran shows are directly embedded into InCrowd's own portal, and that's thanks to this seamless integrated with our APIs. And in addition, I'll just say make sure to notice the power of Tableau and Dremio working so well. As Ciaran's going through, you know, he's able to click into dashboards, you see how quick the response time is, he's iterating on them, receiving answers, insight he needs. So just keep it in mind, you know, this is something ... This is a good example of Tableau and Dremio working really well together. And to the extensibility aspect of things, you know, I love seeing people embed Tableau into their environment so that they can focus on building and bettering their product, and also make the experience as smooth and familiar as possible for their customers with that embedded experience.
Conor KnowlesSo thank you everybody for letting me talk a few minutes about Tableau. I'm going to sit back with you all and really excited to learn more about InCrowd.
Jason NadeauExcellent. Thank you Conor. And with that, we are super excited to turn over the stage to Ciaran Fisher. He's the CTO at InCrowd, he's going to show us how they're actually using it in their environment and to help their customers. So with that Ciaran, over to you.
Ciaran FisherThank you very much Jason. So I just need to share my screen now. Okay.
Ciaran FisherThere we go. So yeah, thank you very much. So I'm Ciaran, I'm the CTO of InCrowd. And InCrowd is addressing business. So we're a fan focused organization that facilitates the sports marketing industry. And what that really means is we produce technology that enables the clubs to connect direct to fans and then to understand their fans to a greater extent, as well as personalize the fan experience. So as you can already tell from my accent, and as Jason said earlier, I'm based in the UK, so we're mostly a UK-based organization, although we do have some outposts in Australia and again Hong Kong as well. So a little bit international.
Ciaran FisherAnd we originally started in football, so we're very, very strong in football in the UK. [inaudible ] as well as 14 clubs. We work with clubs and leagues, so well across ... Actually rugby, probably 14 on the league side, as well as the RFL, we work with cricket and some of the really big names on there is Formula 1. So we do two apps for Formula 1, and for all of our clients we provide data and insights to them, to various extents. But originally, mobile apps is where we started and that is where we started to gather all this data from.
Ciaran FisherSo, ultimately why does data matter? Why are we here? Why are we collecting it? So, every single business, no matter who you are, is a data business. You are collecting data, be it account address books, calls you've made, whatever there is, you are collecting data. And without data, you can't make any informed decisions. You're just going by gut feel, and gut feel will get you so far, but it's good to have the data to back up those decisions. And in our particular case, the most important thing is that we want to enable clubs and leagues to understand their fans and deliver personalized activations in moments that matter, as well as provide a better fan experience, because ultimately, you are all fans of the club and you want to have a good interaction with them, be that receiving relevant offers, receiving discounts. For example, if the club knows you are a family, so a family ticket might nudge you to come along to the event where you wouldn't normally be able to participate. So all of that is why this data matters and why is it important.
Ciaran FisherSo what do we do for our clients and what kind of data do we ingest? So as I touched upon, we have a variety of parts of the business that build apps, build data, and inside dashboards, which is why we're here today. So depending on whether you're a club or a league, you're going to have a completely different set of data that you're gathering, but there will be some fair commonalities amongst them. It'll just be different volumes, so as a league you're only selling a couple events a year, whereas with a club you're selling events every single week. So ticketing data, first on the slide there, is the absolute key. For clubs, ticketing data is both single match tickets and season tickets. Season ticket renewals are a hugely important thing for a club, say for example a local team here in Brighton, where I'm based. They'll probably have 17,000-18,000 to about 30,000 season ticket holders, so a lot of their effort goes into making sure there's seat ticket renewals. Renewals come in, the data we ingest for all these clients into our data lake, it also includes things like click stream data. So as I've mentioned, we produce apps, those apps generate click data, and we use a tool called Snowplow to gather up all that data.
Ciaran FisherWe also bring in data clients from Google analytics, we also bring in things like email marketing, and what we're attempting to do by bringing together all this data is provide insight back to the clubs and the leagues. Yeah, another thing, we also bring in is things like fantasy/gamification, so there's some varying, interesting data sets that you can get from that because you can basically measure engagement. So, during the week your fans are getting excited for the game, gamification/fantasy is a great way to keep them engaged during the sort of off-days. Social data, absolutely key to bring in. What is your impact outside of your sort of direct marketing that you normally do. Another key one, video streaming, so again clubs are looking for different revenue streams alongside ticketing, video, and video-on-demand is a fantastic source of, well, engagement for fans because fans want to have exclusive content, they want to feel close to their club, and also another revenue stream for clubs outside of ticketing. So I think it's like subscription data for OTT is absolutely key, so for example how many people are watching, whether they're churning, all kinds of stuff like that.
Ciaran FisherAnd I won't go along many more of these but participation data is another interesting one we do for a particular client, and that's basically how well is the sport doing and is it growing, because ultimately you need young people coming in playing those sports, being interested, and playing.
Ciaran FisherSo, a little bit of what we're kind of here for, and I'll likely call out Jason's and Conor's points, is what we started with. So all those data sources you saw in that previous slide, we needed to put them somewhere. So originally we started with this wonderful spaghetti mess, so exactly as Jason mentioned, the old dreaded ETL scripts. So in effect, what we had is that old architecture. You have a lot of ETL scripts running that are relatively brittle, they're hard to build, they're hard to build new ones, and they need to store their data in various sources. So we use S3, Redshift, and PostgreSQL, both for storage and for analyzing data. And then again, the output was going up to MySQL, a bit of Excel, and then we built some custom dashboards. So the problem with all of this is data governance is very manual, it's silo data, so it's very, very difficult to necessarily know where all the data is and where to get it. And we had to build those dashboards ourselves. So to echo Conor's point again on Tableau, we had to put a lot of effort into building dashboards and we're just reinventing the wheel, and we ... You don't need to do that now. Every wheel has been invented in tech, to some extent, just depends on whether you like that particular wheel and you can use that.
Ciaran FisherSo our customers are also demanding access to their own data and from our side, again, we had slow batch jobs running. So it was relatively slow, not necessarily the best architecture. It did work, ultimately, so we did provide insights and we had quite nice custom dashboards for our clients, and the data team did a fantastic job of providing insight to them in various ways, whether that be through cube extracts, Excel, PowerPoint slides, all that kind of stuff we would get that data.
Ciaran FisherSo what happened next? Ultimately as a CTO, you are challenged by your CEO who says, "Right, this isn't good enough. How are we going to make this better?" So what we ... He set a few high level targets for us to meet, and myself, the data [inaudible] team and the backend team came up with a way of doing this. So number one was we have this portal, Bridge, which is where all of our clients engage with all the products and tools that we have. So number one is whatever tool we build needs to integrate into Bridge, needs to be an absolutely seamless experience for our clients, it needs to sit alongside their push notifications, email marketing, content management. It has to sit alongside that.
Ciaran FisherThe other one was realtime data, that's absolutely key. We need to provide insight as quickly as possible, enabling fans and ... Clubs and leagues to understand their fans, in realtime ultimately, but as near in realtime as you can get it, using a few batch jobs as possible to get that processes time up. And even to use our existing SSO, so again we built SSO systems that we use for various clubs and on our apps and we needed to integrate in that existing SSO and then permission system. Again, what we're trying to build is something that is scalable, so it has to be a single platforming solution for all clients. We cannot be doing bespoke builds for this client over here and this client over here, because you just can't scale. It works for say one or two, maybe three clients but once you get to 20, 30, 40, it's just not going to work.
Ciaran FisherAnd our final one is that clients have to be able to self-serve. These dashboards needed to be interactive. They had to be able to derive insights from these dashboards by diving in, clicking through, and really sort of digging into that data.
Ciaran FisherSo as a sort of quick recap on some of the things you've heard before, why Dremio? So number one, the logo, that was what really sold it to the data team, they were absolutely chuffed because they love narwhals. So that was number one. Number two was all the stuff you kind of heard from Jason before, so the fact that it is flexible and fast, it uses in-memory and it has on-disk accelerations. Those accelerations are absolutely key in order to provide a tool that you can use, use quickly, and get insight from very, very quickly. Another one, especially in a post-GDPR world, over in Europe we had to have strong governance controls. What data do we have, how long have we had it for, where has it come from, and what is it? That is absolutely key and Dremio has some very, very good integrations enabling us to span out, this is the final output. Where did this data come from? Which tables? What was joined together? And what users can see in that particular source of data.
Ciaran FisherAnother key one for us, so as a business we almost exclusively use opensource technology, Tableau being an exception is the fact that it is opensource, so it's very, very easy for us to understand how to host it, to dig into it, to understand it more and know if it has good backing. And the other one was the fact that it needed to be quick and easy to integrate. We needed to be able to get this quickly into our existing workflows and start hooking it up. So, SQL access is essential.
Ciaran FisherSo, this is our new architecture. So all we did is basically stole Jason's slide and chopped out all that horrible ETL stuff and replaced it with Dremio and Tableau. So those are the two technologies that you've previously heard about and we hooked everything up to Dremio, so Dremio did acceleration from our S3 data lake. We ingested every single data source that we had through it, and that included relation of databases such as PostgreSQL, column-led databases such as Redshift which had a lot of our click stream data in it from Snowplow, and non-relation databases, so document-based. That one would [inaudible] which runs a lot of the backing systems that we have. And this went very well, so we got rid of ETL scripts, we got rid of a lot of batch jobs. We've got integration into Tableau, data governance is hugely improved. And absolutely the key one is that now analysts can build dashboards without developers. Not that I've got anything against developers, having once been a developer myself, but analysts need to be able to use their tools to develop ... To provide insight. They are analysts, they need to be able to analyze the data and present it back in a useful way.
Ciaran FisherAnd using Tableau basically means now that we don't need to go, "Right, okay, we need to go to a front-end developer, we need to get a data engineer, we need to get a backend engineer, we need to get them all in a room and we need to figure this out, we need to have a data pipeline planned out." All of that goes out the window. There's a conversation between the engineering team and the analyst, he says, "I want this pluck of data," he says, "I want this certain data." They say, "Right, there you go." [inaudible] Dremio, and off you go. Fantastic.
Ciaran FisherSo just a few slides to kind of give you an idea of some of the backend. So where this slide is not something I've drawn, this has actually been generated from Dremio itself. So this is a view just showing you the various tables ... The various sources we've used, the various tables we've joined together, and then the final output which we've exposed to Tableau. This particular example is just showing one of the tables that brings up our audience builder, so bringing a whole bunch of different datasets together to provide a single customer view, enabling you to say, "Right, has this particular user played a predictor and what age are they? Where do they live? And how engaged are they with our content? Do they read the app weekly? Do they read it daily? Do they come to the website?" All that kind of stuff, to bring up a true single customer view.
Ciaran FisherAnother one, content usage, which leads into single customer view, is we have again, a lot of data coming in from app and web, we need to be able to join that, pull it in from the various click stream sources as well the original source data, and give you some analytics on it. So again, this is not something I've drawn. Coming straight out of Dremio.
Ciaran FisherAnd now onto the hyped and slightly terrifying live demos. So ... Hope you can all see this.
Ciaran FisherThis is Tableau. So as mentioned and to [inaudible] requirements, this is our online portal Bridge that you can see in the bar. And this is all Tableau, so the examples I'm going to show you are all based on dummy data, because we have a lot of clients and we can't share the data on livestreams, so there might be a few slightly odd things around data. So this dashboard is basically to give you a quick overview of the various sources of data you have feeding into your data lake. So how many people are coming in and how many people are ID-ed across those various sources. So exactly here, they're showing you whether someone is ID across a single source or whether they actually match up across a bunch of different sources. And an idea of your total number of people that you can contact, as well as ideas on how your databases are growing. And this particular example is more league based, with the whole teams supported. So who supports what team, should you be doing marketing towards particular teams to get them to come along to your final, semi-final, that kind of stuff.
Ciaran FisherAnd an important one, demographics. What's the age of the people coming? Are there still young people coming in? So an example, the average age is 63, so you're probably going to be wanting to do some marketing now in the sort of millennial area to try and get people coming in, maybe end up bringing their kids along and coming in.
Ciaran FisherSo as I work through these ... Okay, they will get slightly more sophisticated with slightly more interaction, so another sort of general reporting dashboard is this one. So this gives you an idea of app usage. So for example, how many downloads do we have? How do they compare to the previous year? Do we need to get marketing involved in getting those numbers up? Can we see any particular impact from a result of our marketing? For example, how we've got a spike here saying, "Great, that was a great Christmas promotion," well in fact great summer promotion saying, "Hey, download the app." How do we compare to that previous month, and all that. So again, we're trying to bring that insight out to the user in a nice, easy to understand way, just by interacting with these dashboards. So number of users, sessions, user duration, how popular is your content, how long people are reading it for. That gives you an idea of user direction. How many sessions, which is basically how sticky are you, people checking in daily, weekly, monthly. All that kind of stuff.
Ciaran FisherAnd an important one for sales is, "Great, we've got an app. How to make some money from it?" One way is tickets. So we've added a ticket link on the app, how many people are coming through to it? Have signposted in enough ways? Is it easy enough to find? So here you can see, yep, we've got some nice high bars. We've got a lot of click throughs and hopefully those have transitioned into sales. And on a different dashboard, we agree with that information as well, so you'll be able to map the data sources together and say, "Yes, this click through is working, at the drop of this X, and we've got this many sales, so everything's good."
Ciaran FisherNext one is, again, more towards the marketing part. So this slightly complicated looking dashboard is, again, some dummy data but something that ... An ability to understand the fan usage. So when is the best time to contact your fans? This gives you a good breakdown of when are they using a particular part of this app, in this particular case, which is our predictor. So what is the highest throughput time? So really, if you want to have highly engaged fans and send them messages, send them offers, get them excited, Sunday afternoon, is going to be your winner. Actually, well, . Either one will work. But don't send it on a Wednesday. There's no one there, there's no one checking in. So again, we're trying to provide really simple, in effect, interactive infographics back to the clubs and leagues in order to understand this. So as a club, in particular clubs are often very, very overworked. You have a surprisingly small staff doing a lot of different jobs. So whatever we produce needs to be impactful and useful, straight up bars.
Ciaran FisherAnd as I kind of mentioned, we're now bringing in some sort of filtering. So again, this is live filtering coming in, so you can go and mess around, see what's the previous year, all that kind of stuff. And the final one I want to talk through is the ticketing one. So this is our example of sort of most interactive dashboard. And exactly as Conor said earlier, this all running straight from Dremio. So in order to find out how well our index [inaudible] 16 to 24 year olds. Where are people coming from? So Dublin, everyone's coming in. This particular example, again, is some demo data for more of a league, because we've got teams supported, so that wouldn't be really relevant for a club but it can be modified to fit a club. And all those different particular filters, how do they all match up to each other? So if we go back to everybody ... Let's go, I don't know, 25 to 34, and of that are they male? Yeah. You can see exactly as we're interacting with this, it's updating in realtime, and you can see what the revenue ... You can change that to revenue in ticket sales, for all those different people are ... And most importantly, where are they coming from?
Ciaran FisherSo again, as a ... More for a league than a club, you want to understand where are your fans, where should we put the final? Should it be in a particular area of the country, because we've got a huge concentration of fans there, or should it be somewhere with good rail links to these various places? So we're providing data dashboards in order to help you understand where your fan base is coming from.
Ciaran FisherThere we go. And over here. So ... [inaudible] I'll just finish this. So number one, the demo's worked, which is always good. So I'll just end on this summary here in order to present.
Ciaran FisherSo as you sort of saw, I've gone through all of these different things, you've seen that in that interactive demo in realtime. You can go through and gain some insight by analyzing that data on your own. You don't need to be a data engineer, you don't need a degree in mathematics. Anybody can go and do this. You can just go on, play around with the filters and explore the data. And it's your data, just presented back in a nice, novel, and interesting way. And that is underpinned by the fantastic technology from Dremio and Tableau. And ultimately, do we meet our requirements? Absolutely we did, we smashed it out of the park. We've met our CEO's requirements and everybody's happy.
Ciaran FisherTableau and Dremio, as mentioned, are a fantastic combination of tools, and I'm not sure if this phrase translates but we are, "Eating our own dog food." We use these tools internally for our own reporting. So it isn't just what we provide to our customers, and then we use some of our tools off the shelf, we feel good enough around these tools that we will use them internally to report back to ourselves, to the board, and everybody else, as well as for our customers to use it. And a very important point is the amount of time that this has saved us. So whereas before, as I mentioned, we needed backend developers, front-end developers, web developers, teams and teams of people to basically generate a new dashboard. Now it takes us a day, depending on how overworked our analysts are. So they can go in, give them a brief, they can play around with the data sources straight in Tableau, coming directly out of Dremio, and it knocks something up and have it appear on Bridge in order for the client then to have a look. And ultimately the most important point is we now have a system that's scalable. We need to support rapid growth, otherwise we don't succeed as a company, and this system does that.
Ciaran FisherAnd with that, I'll hand it back to Dremio.
Jason NadeauFantastic. Thank you Ciaran. That was a great run through of how you guys are using both Dremio and Tableau for your customers. Some really cool visualizations there and I can see why your own customers are really excited about the insights that they're able to get using you as a platform. So thank you for sharing that with us, and in terms of the kind of closing words here and then we'll head over to the Q&A. If anybody on the webinar wants to learn more, for sure you can hit the Dremio website. We've got a few different places that are really interesting to go check out. So one is go try the software. You can go to the deploy part of our website and get the software and deploy it OnPrem, if you have an OnPrem data lake environment, or of course in the cloud as well, which is where most people are starting to develop their data lakes and AWS-S3 like you're seeing here with InCrowd, but Microsoft's ADLS as well.
Jason NadeauIf you want to learn more about Dremio and how it works, we actually have an online university, so check that out. It's like a selection of courses there that'll get you up to speed, and if you've got other questions, there's an entire community that exists, so you can go and see and talk to a bunch of different folks in lots of different industries that are using Dremio for, you know, different use cases like we see here. And of course from a Tableau point of view, there's a free trial also. So absolutely go get that and check it out. We hope you'll run them together, you know, Dremio and Tableau, to do some of the cool things that you're seeing InCrowd do as well.
Jason NadeauSo we are done, the official webinar portion, and at this point let's head into Q&A. So I'm going to start with a question ... A couple questions have come up for Dremio, so I'll take those. Really about what's the difference between what Dremio is doing on the data lake versus other Apache products or, you know, projects I should say and things like Presto and whatnot. And I would say it comes down to three things, and we saw this more in the earlier slides there too. So the first is performance. So at the baseline, Dremio's going to be about three times faster and that's because of the Arrow-based engine that we have, it's all column [inaudible ], in-memory and you know, not just the Arrow part but the really massively parallel readers that we have pulling data up out of the data lake, for example. A few other technologies in there. So that's like at the baseline, it's significantly faster and that's really helpful for ... Particularly you think about ad hoc, query-type use cases.
Jason NadeauBut then there's portions of your data that are really high value and that you want to optimize for and really accelerate. And so we've got a couple other technologies like our Column or Cloud Cache and our Data Reflections that are really going to add another 100 to 1000X performance boost on top. And it's all built in to Dremio. So you know, that's part of the self-service semantic layer that we call it, so that's ... It's abstracted. There's no exposed cube or materialized views or anything like that. This is all internal to Dremio and it is a physically optimized representation of that data. That's the way we're going to get that type of acceleration inside Dremio. So at the point, you're really significantly faster and that's where we talk about having live interactive performance.
Jason NadeauSo that's one big thing. The next is the difference is much greater efficiency. Because we're so much faster, our queries will complete way faster and that means you can reduce the amount of time that your cluster nodes, your executor nodes, worker nodes, are actually up and running. And that cuts your cost. And so we're already seeing about a 10X reduction in cloud infrastructure compared to other SQL engines.
Jason NadeauAnd last but not least is the absolute, just the mere existence of this semantic layer for data governance, for data lineage, the ability to create this curated environment that data engineers can use and share and BI users themselves can share data as well and fundamentally just have control of the datasets that they have so it's not a wild west, right? So the same sorts of things that you heard from Ciaran, and that layer, that self-service semantic layer is something that only Dremio provides.
Jason NadeauSo hopefully that gives a sense really of what those differences are. Performance is absolutely a big one. So that's the first question, and now we're going to go over to Tableau. So here's a question for you Conor, the question is when we create a dashboard and publish it to a server, is it possible to set up live connections between Dremio and Tableau? And if yes, does that mean when the user views the dashboard it will execute the SQL to Dremio?
Conor KnowlesYeah, great question. So live connection in Tableau is ... Tableau is paying for the database or in this case connected to Dremio, it's going exactly through that connection. So having that realtime up-to-date data, when you connect to your data initially, you choose in Tableau if you want to do a live connection or an extract. Once you publish that dashboard or connection to the Tableau server, it's going to run exactly how you want, live or extract. Obviously one of the benefits to connecting to Dremio is that live connectivity, so you're absolutely right. That means it'll execute that query running through Dremio. I'll let either InCrowd or Dremio, I guess, dive in a little more if they like, but that's exactly correct.
Ciaran FisherYeah, I'll just take that point, absolutely. So what you've just seen there is exactly that happening. We're running those queries directly against Dremio. That is, Dremio being queried by Tableau, as I was messing around with all of the various filters.
Jason NadeauYeah, great. Okay, thanks and sort of a follow-on, another question I see. So Ciaran, for you, the question was do you still use Tableau extracts or just the live connection to Dremio? I think you kind of just answered that one.
Ciaran FisherYeah, yeah. We just use the live connection to Dremio.
Jason NadeauYep. And for everybody, that's one of the key values that I just kind of talked about, right? Dremio's taking care of all the acceleration inside and provide net performance. It really simplifies the rest of the architecture. Okay, so Ciaran, another question for you, which is can you expand on the benefits you realized in the area of data governance?
Ciaran FisherYeah, absolutely. So, I mean, data governance is ultimately a lot of documentation. So you need to have absolutely a twist documentation saying, "What is this? How did it come to us?" You need data and credence. You need to have specifications, those specifications kept up to date, and once that happens that's not the end of it. You now have got that data, it's sitting in S3 or sitting in a database or something like that. Then what happens? So say we're pulling ticketing data through, are we allowed to show the customer names on that data board? And the answer is no. So how do we make sure that's not the case? So we use masking in Dremio, we use the strong ways, the old app integration. We're just getting up the money now in order to have users against Tableau and particularly users on Bridge against Tableau to add another level of variation to it. So we have basically strong controls all the way through.
Ciaran FisherYou still need all that documentation, that's absolutely key. But a lot of it is managed by Dremio itself. We can now say, "Okay, we've joined all these three tables together and this is the result, where did it come from?" Before, we had to have documentation or you had to post the ETL scripts. Now you can just go to Dremio and say, "Okay, great, where did this particular row come from?" And Dremio will show you.
Jason NadeauOkay, excellent, thank you. Here's one that is kind of to both you and potentially to Conor as well, which is to say, "How are these dashboards integrated in your web app? Because it's clear they're coming from Tableau in some way."
Conor KnowlesYeah Ciaran, I can start if you want to take it from me, but yeah, that's our extensibility, our APIs with Tableau. Essentially, Tableau is just serving as kind of the picture within the window, integrating directly into the company portal there. I don't know, Ciaran, if you want to speak maybe more specifically to the APIs but that is Tableau's embedded.
Ciaran FisherYeah, exactly that. So Tableau has fantastic APIs and it has the ability to embed. But the key thing is it has the ability to embed as a particular user. So what we've done is built a bunch of middleware that basically says, as I also mentioned before we have our own SSO, so we bring that SSO user in and we check their permissions, we then check permissions on Tableau and which user that needs to be used in order to serve that dashboard or whether it can serve the dashboard, and then trigger request Tableau to say, "Hey, load me up Jason's dashboard because he has these permissions," and as long as those all match, back it comes. In effect what it is is an eye frame, but before that there's a whole bunch of API calls that also generate that embed code.
Jason NadeauOkay, great. Here's one for you Ciaran. What is the size of the datasets that you're dealing with? You know, people are wondering like, just for example, how many rows are we talking about? And how fast ... Thus, to try to get a sense of the performance that you're seeing.
Ciaran FisherYes, so it really depends on the sort of data. So I've mentioned, we've got click stream data. So click stream data, you're talking billions and billions of rows ... Sorry, billions and billions of rows over a period of time, because that's every page view, every interaction, every click. We don't necessarily bring all of that in Dremio but we bring large chunks of that in and then do analysis on top of it. But it can vary, so you can have say for ticketing for one client, we had gigabyte size. If you tip the MySQL dunk, you're talking 20 or 30 GB of ticketing data, with maybe 20 columns? 30 columns? So a fairly rich dataset, not big data, not hundreds and hundreds of terabytes but still a relatively large dataset that you need to index properly in order to do the analysis with it. But where it really gets interesting is mixing that ticketing data where we've got that one dataset with those billions of rows from click stream data. So a particular user clicked on these various things, what was their usage? How do we match those two up? So, yeah, it kind of varies. I'm not sure I really answered your question. That's the best I can do.
Jason NadeauYeah, it gives a sense of the scale for sure. So a lot of questions for you Ciaran, no surprise, right? This is really what people want to understand is how are you using these technologies together to solve your problem. So here's a couple more for you. In terms of your architecture, are you using Dremio deployed on [inaudible ] or in the cloud? That's part one.
Ciaran FisherYeah, so we are AWS all the way, so we're 100% cloud native. We don't have any real infrastructure onsite, just from sort of the age of the company and how we grew up, it made sense. So we run Dremio very, very happily on AWS.
Jason NadeauGreat, yep, you bet. And for everybody, absolutely, Dremio is, you know, can be deployed in lots of different places, it's a very multi-cloud friendly architecture. In fact you could even do a hybrid cloud deployment if you want, but AWS is a big focus for a lot of our customers and just like you're saying here from InCrowd. So then the follow-on question is upstream from Dremio in that architecture picture that you have, you showed PostgreSQL, Redshift, and MongoDB, which are all very different database architectures. What were the different workloads you originally had for each of those?
Ciaran FisherYeah, so basically it's bespoke ETL scripts. It is code in Python mostly, but basically just a bunch of Python extracts running, and then either producing CSVs that end up S3 or dropping into another just relational database in order to make it slightly easier to query. But yeah, lots and lots of ETL scripts, and the fantastic thing about Dremio is we can hook those directly up. There are connectors for all those different databases and we can bring them in and query them.
Jason NadeauYeah, excellent. So I'm going to keep them coming for you. This is great, so another question is how has Dremio and this architecture changed your data persistence footprint? Or what types of data you have found you need to persist outside of Dremio, like tracking SCDs only, for example.
Ciaran FisherThat's an interesting question. I mean, I think we're not duplicating as much data I would say, so sort of traditionally a bit like you said before is the fact that you would ... Your data process is in a pipeline, and as you go down that pipeline, you lose that granularity but you also have storage costs. Those go down because you're losing granularity and losing rows and columns as you go. But you still need it. Whereas with Dremio, we're not really doing that. We're going back to the source, querying the source, and Dremio itself is herding that data straight back to us. So I guess the footprint's gone down, I would say. Yeah, no, I definitely would say it's gone down, as a result of Dremio.
Jason NadeauOkay. Great. I think this question is for Conor. So Conor, the question is can we connect from Tableau to Dremio?
Conor KnowlesYeah, absolutely. Tableau and Dremio, you can connect to it via, I believe it's ODBC connection and you can maintain that live connectivity, so I'll just keep it at that actually. It's as simple as just connecting Tableau to Dremio. Very possible, that's part of why customers use us together.
Jason NadeauYeah, you bet. So there's also a couple questions in here that are related, to do with caching. So let me ... I'll kind of like combine them together, I would say they're for me but also for Ciaran. So sort of the question is when, Ciaran, when you say the dashboards are updating in realtime, are they pulling data from the Tableau cache, the Dremio cache, or from the original data sources? And I'll let you take it first and then I'll add my color, if that's all right.
Ciaran FisherYeah, absolutely. So it depends on the dashboard, but pretty much mostly it's coming straight from Dremio. And Dremio has an acceleration layer built into it, so if the underlying data source that Dremio's connected to is fast enough, you could bring it all the way through. But it depends on the data source and how quickly you want to bring that through. So it depends on the dashboard. The ones I showed, they were running directly from Dremio. Yep, they were. But there are different levels depending on the performance that you want out of it. But we've seen very, very good performance of running straight against Dremio, and then against the data sources.
Jason NadeauYeah, that's right. And so to add a little bit of color there, like I was sharing just a bit earlier on sort of the difference between us and some of the other sequel engines that exist, there's definitely different types of caching that happen inside Dremio that are transparent, right? So that, you know, Ciaran and Tableau can just connect to Dremio without having to worry about what they're connecting to. Virtual data sources stay the same, but behind the scenes, underneath the covers for example, we've got a column or cloud cache that is just transparently reading data in off of S3, you know, in this case, and storing it right on the NVME storage part of, for example, the instance nodes that are running in Amazon. And so that gives us a big boost in performance, just kind of natively, you know, for anything that people are querying.
Jason NadeauThen I also mentioned though that for the important datasets that, for example, that Ciaran knows. Oh, these ones we really want to accelerate, we have a physically optimized representation which is another type of cache, think of it that way. We call it a data reflection and you know, this is all done with parquet, stored on disk or in this case, you know, in S3. Generally speaking. And it's what's giving an additional 100 to 1000X type performance, so that, you know, the things that they really want to accelerate are getting sub-second type responses. And of course that can be used in a ad hoc query case to an extent, right? If it's in an around the same datasets that are being accelerated with reflections but certainly for the reports, which is most often where that type of caching is being used.
Jason NadeauBut it's all inside of Dremio. Again, transparent, not something that is exposed out, so when people like Ciaran are building their connections in, they don't have to worry about things changing under the covers or how they can add that acceleration, it's just something they can chose to add whenever they want.
Jason NadeauOkay. So with that, I want to say thank you to everybody for joining us today. A lot of great questions. Frankly, still more that we've not been able to answer, and so we will absolutely followup with the folks that we weren't able to talk to or answer directly today. So I want to say thanks to Conor and say thanks to Ciaran for your time. And wish everybody a great rest of your day.