Analyze Your Entire Cloud Data Lake in Real Time

Session Abstract

To meet performance and data governance requirements, data teams are required to extract subsets of data from the cloud data lake and replicate it into a data warehouse. This process requires analysts to wait hours or days for business-critical data to be accessible.

What if you could provide your analysts self-service access to all the pertinent data in your data lake to enable real-time business analytics?

Join technical experts from Tableau and Dremio as they discuss how to enable fast access to more complete data and accelerate query performance. They’ll demonstrate how you can easily connect Tableau to your data lake with Dremio to immediately begin driving better business decisions.

You'll learn:

  • Visualize data directly on your data lake
  • Provision new datasets with consistent KPIs and business logic in minutes, not days or weeks
  • Empower analysts to create their own derivative datasets, without copies
  • Accelerate analytics queries for real-time data visualizations
  • Minimize data copies and movement to meet data governance requirements

Presented By

Blair Hutchinson, Product Manager, Technology Partners - Tableau


Brock Griffey, Solutions Architect - Dremio

 

Webinar Transcript

Louise Westoby:

Welcome everyone. Thank you for joining us for today's webinar, Analyze Your Entire Cloud Data Lake in Real Time. My name is Louise Westoby, and I am the Senior Director of Product Marketing here at Dremio. I'd like to introduce our speakers. You could go to the next slide. Blair Hutchinson, Product Manager for Technology Partners at Tableau and Brock Griffey, Solutions Architect to Dremio. Blair and Brock are going to share their experiences working with leading companies to provide analysts with self-service access to all the pertinent data in their data lake to enable real time business analytics. And please stay through the end is Brock will also be demoing how you can easily connect Tableau to your data lake with Dremio to immediately begin driving better business decisions. Now without further ado, I'll turn it over to Blair.

Blair Hutchinson:

Great Louise. Thanks so much. And thank you Dremio for inviting me to speak today. A little bit about myself before I introduce Brock and jump into the slide here. I work as a product manager at Tableau is Louise said. Been with the company for about five years and work specifically with our technology partners as a product manager, helping them be empowered to work with our platform and our mutual customers. Brock, do you want to quickly introduce yourself?

Brock Griffey:

Yes. I'm Brock Griffey, been with Dremio for a little over a year now. I am a senior solutions architect here. I work with many customers on the Southeast accounts and working closely with the customers using Tableau and various BI tools as well as Dremio obviously.

Blair Hutchinson:

Great. So, on this slide. I think that before we go into the how and what of Tableau, I think it's really important to start with the why. And to do that, I think it's important to understand a little bit of our history and what's been so pivotal in not only setting our vision and mission, but ultimately what drives our platform and what we deliver today. Tableau's really been focused on one thing. And it's the highlighted part of this sentence, people. We believe that people are your greatest asset and data. This data that you spend so much time and money capturing, curating, cleaning, and storing can be used to make your people even more valuable. So if you can get the data into the hands of the right people who know the business and have the questions that need answering, that's really when the magic starts to happen.

Blair Hutchinson:

And that's when we think that their creativity and curiosity gets awakened and they use facts and their facts to support their intuition. And they see opportunities in your business that have never been seen before, and they become more engaged with what they're doing. This, we think helps drive organizations forward. And this is really been the fuel behind the inception of Tableau 15 years ago, and continues to drive us today. So we are 100% focused on helping people see and understand their data. So if we jump to the next slide, we can talk a little bit about what we deliver. So we may have started out as a desktop analytics tool all that time ago, but we are so much more now. When you bring together Tableau Server and Tableau Online, Web Offering, Tableau Prep, Tableau Catalog, Ask Data, our embedded offerings, Tableau in mobile devices, and the rest of the great stuff that's available as part of the platform we deliver what our customers need for that end to end analytics experience.

Blair Hutchinson:

And we understand really the data is mission critical to any organization. And that is why we've built the Tableau platform to really ensure that you have everything you need to empower everyone at the company with the data without putting your data or your organization at risk. So our platform provides that depth and breadth of capabilities that allow you and your employees to deploy data across your organization. So this includes capabilities that ensure the right data is made accessible with flexible data access and powerful data capabilities. Our solid content and data governance features keep the data in the hands of the right people and also ensure that users can find the right data and make decisions about accurate and trusted data.

Blair Hutchinson:

So powerful analytical tools allow you to really then ask those deeper questions and allow you to collaborate across data or collaborate with data so that it's really at the center of every conversation that you have. And that can be via desktop applications, within the browser, could be on mobile devices, or it could be also embedded in other applications where users might we be wanting to use data. In all of these capabilities it's important to call out are surrounded by right security protocols and reliable and scalable tools to meet the needs of your business.

Blair Hutchinson:

Right. If we jump to the next slide, then we can talk a little bit about how we've designed Tableau and it's really meant to fit and not dictate your analytical strategy. So if I move from the right side of this and go to the left we see that we meet people really where they are allowing anyone to access any information anywhere on the platform. So first off it can be deployed on premise. It can be deployed behind your firewall. It could be in the public Cloud, which I'll talk to you a little bit more in a second here, or it could be via our Tableau Online SAS offering. Under the query column, we make it so that you can query your data via a live connection, or it could be taken in memory. So you can take advantage of hyper or in-memory data engine.

Blair Hutchinson:

And finally, I think this is probably the most important column, because if you can't connect to your data, then how do you see and understand it? We connect to your data really wherever it is. So whether that's an Excel file or a relational database, whether it's aggregated tables locally on your machine or huge amounts of data in your data lake stored in the Cloud, we make it easy to connect and even make it easy to bring data together into the same view. This connectivity, this data source column is also where we lean on our partners like Dremio that are doing really incredible things for our customers, making it even easier to connect to the data that's traditionally been so hard to connect to. So with that, we still face challenges in this space and it makes my job as a product manager, really exciting.

Blair Hutchinson:

So the first piece of that is access to the right data. And for any of us that work with data, you come to the table with a question and sometimes leave with more than what you started with. And if you don't have access to the data that allows you to drill into those very granular levels, to ask and answer those next questions, it becomes really difficult to do your job and to find those answers. And so we think that it's so important if organizations are looking to be more data-driven that they allow access to the data that allows you to answer those deeper questions. In the middle here, we have this idea that we need to be able to move quicker. And this might as well read performance because when I talk to customers, everything needs to be faster. When you're exploring your data, nothing will get you more out of the flow than having to wait for a query to execute.

Blair Hutchinson:

And if I'm on the side of this, if I'm loading up a dashboard and I spent two minutes watching a spinning wheel as my dashboard renders, well, how likely am I to revisit that dashboard for answers? So we think about querying data, it should fly and regardless of the size, and that's what we're excited to talk to you about today, really how you can do that with Dremio and Tableau. And finally, the final challenge that I just want to talk about is one that really relates to this first one around data access. But data silos make it really difficult to get a hold of the data that you need to make the right decisions and it impedes your ability to be self-sufficient and to answer those questions. And in here lies a balance between IT and the needs of the business.

Blair Hutchinson:

So how can we think about making it easier to govern self service, data exploration that satisfies both parties? Now it would be naive to think that Tableau could solve all of these problems by itself. We are one part of the entire stack that involves making these things possible, which is again, why I think it's so important to lean on some of our great partners like Dremio. So the last thing that I want to talk about before I turn it over to Brock is just some of the trends that we're seeing at Tableau. So, if we weren't already aware there is a massive digital transformation that's happening all around us. And the pandemic has really put that in the limelight and it's now happening faster than ever because of it. So as part of that, we're seeing actually more and more of our customers that are choosing to migrate to the Cloud and that's for a variety of different reasons.

Blair Hutchinson:

But the one that I want to call out today is this notion of data gravity. And if you haven't heard that term before, it's the idea that data and applications are attracted to each other and as the size of data grows, it's harder to move it so it stays put. So the applications and processing power end up going to where that data resides. And I think Brock will also go into that in a little bit more detail. Part of that is that we still need to maintain that flexibility and choice. And it's easy then for Tableau customers to think about transitioning off of their on-premise solutions. And so as I kind of mentioned before, customers are choosing to do way with thinking about managing their own software. When I mentioned there are our SAS offering that we also have Tableau Online.

Blair Hutchinson:

So whether you're looking to deploy on the Cloud and manage it yourself, or use our SAS offering, we kind of offer both of those worlds. And as you're thinking about migrating from one to the other, you can connect and we can be there through that whole process. And finally, the last bullet here, self service. More and more companies are wanting to create a culture of data literacy. And it's something that we've seen throughout our history at the company. And so we're really focusing on making the platform and tools easier to use for everyone that wants to have a conversation and discover data. So with that Brock I'm going to pass it over to you.

Brock Griffey:

Thank you, Blair. So continuing on these trends that we're seeing want to talk a little bit more about the migration to the Cloud and the data lake itself. So we can see here that the data lake is growing 100% year over year. We expect by the year 2025, that 50% of your data will live in the data lake, which is great. The data lake is easily expandable, definitively scalable, and also has a very low TCO just to put your data in there. But with that, we do see problems. It's difficult to consume that data and it's not always fast to be able to read out of the data lake.

Brock Griffey:

So typically people turn to is another problem for a solution. What they try and do is they try and create these complex brittle ETL pipelines. And which ended up leading to a copy of the data inside these proprietary expensive data warehouses. And even then, now that they've created these data marts or did warehouse copies of the data, they still don't get the performance they need. So they ended up creating more copies of the data in the form of cubes, extracts and aggregation tables. This still ultimately does not solve all their problems because this just creates a whole nother problem, decreasing scope and flexibility of their data. So while they can still access this data and user data science tools and their BI tools, they still don't have the full performance that they're looking for and the flexibility that they need.

Brock Griffey:

So this all ultimately leads to an inefficient architecture that is time-consuming to maintain. And it requires a lot of time just to make changes to the data. Anytime someone requests new data, you have to go through the entire process again, and then recreate any cues or extracts that you've done before. And this ultimately leads to your teams being overwhelmed, trying to meet the demands of you, the end user. And in turn, we end up seeing a decreased productivity and a much longer time to insight. So, this is where Dremio comes in. We replace that entire process. We remove the need for any of your cubes, extracts, moving the data around or any copies of the data.

Brock Griffey:

We can actually sit directly on your data lake and provide what we call the Data Lake Engine for your BI dashboards. This allows you to run Tableau queries, live queries directly against our data lake and get sub-second query response times, supporting 1,000s of concurrent users and queries. And through our web interface, we provide a very simple self service access to your data without manual ETL processes and without the major involvement from your engineering team, just to get that data available.

Brock Griffey:

How do we do this? Well, starting at the bottom of the stack here, you can see we have a data lake and what we do is we have a massive parallel reader that's able to utilize many resources and read out of those data lakes very quickly and efficiently. In the next layer we're able to take that data and load it into what we call Columnar Cloud Cache, utilizing the existing hardware within your instances, to be able to give you better performance than if you did not have this. And it accounts for any latency and lax bites, you may see inside the data lake and reduces those. It gives you a more consistent query response time. All of this is being powered by our Apache Arrow processing engine. This Apache Arrow processing engine makes it much faster for any query that comes through to be able to process real time in memory.

Brock Griffey:

If you need even faster response time than our out of the box performance that we already give you, we also have the option to add an acceleration technique called Data Reflections, which gives you the ability to have a already created dataset perform even faster for you. And it gives you a highly granular reusable object that is transparent to the end user, but gives them the performance they're looking for. In addition to all of this, we provide your traditional means of connectivity. So we have your ODBC or JDBC and Rest API, but we also provide another connectivity, which you can think of as ODBC or JDBC 2.0 that's called Arrow Flight. 100s of times faster than ODBC and JDBC. All of this gives you the ability to use any tool that you want. Tableau integrates directly with Dremio and can read the data, live queries directly against your data lake, providing lightening fast, interactive queries. This is four times faster than your ad hoc queries.

Brock Griffey:

We've seen this versus tools such as Presto and other engine tools. In addition on your BI queries, we're able to give you 10 to 100 times faster BI queries. Because Dremio is efficient and we can elastically scale and enable and disable engines automatically, we're actually able to save you 75% more or more on your TCO. So through the Dremio partnership with Tableau, we're actually on the extensions gallery, you can find our Dremio connector directly on the extensions gallery and download it today. In addition to that, we also have a Tableau button inside the Dremio desktop inside Dremio Web UI. This allows you to launch directly into Tableau desktop with single sign on connectivity. So encourage you guys to stay on and I'm going to show you a quick little demo, and then I have some more to talk about after this.

Brock Griffey:

So this is the Dremio Web UI. Once you've logged in, you see your own personal space. In this personal space, you can create your own virtual datasets and upload your own data if you want to do some exploration of the data. Now, these virtual data sets are just views. We're not creating or creating copies of the data. We're just creating views of the data on the underlying data sets that you already have in your data lake. We also have this nice little shareable section here that called spaces, where we can share out our work and create re-usability for other users to use. So maybe we have some KPIs that we're going to keep consistent across the organization, we can share them out in this space.

Brock Griffey:

So I'm going to look for one of those shared spaces or shared data sets, And it's called the New York City Trips dataset. You may be familiar with this. So I've just opened that up. And instantly I'm given a preview of 60,000 records. I didn't have to do anything, I just opened it and I can now see this data without running anything directly on this. I don't need to wait. So going through here, I can quickly see some of the different columns. So the data in here and just get a view and a feeling of what this data looks like. If I want to do some manipulation on this data, I could easily do that up here, inside this SQL editor, but also if I don't know SQL, or if I just don't want to take the time to write the SQL, I can easily go through a web UI, click these dropdowns and manipulate the data right here.

Brock Griffey:

Just to show you where this data is coming from. You can see here in our graph view, this data is coming down from Azure data lake directly. Going back over here, I just want to show you, we have this quick launch Tableau button. So I'm going to launch this in a Tableau and show you how does Tableau and Dremio work together. So we're clicking this, it's going to launch Tableau desktop, and it's going to prompt me for my single sign on password. Since your log in here. Now that I'm in here, you may or may not be familiar with this interface, but Tableau gives you a really easy to use intuitive interface. In here, I can see all the dimensions and measures being pulled in automatically. Anything strings or texts or dates will show up as a dimension and a measure will be anything that we can calculate.

Brock Griffey:

So the first thing I want to do that I'm in here is I want to know how many records I am working with. So I'm going to go ahead and drag this over and drop it over. Quickly, a live query has been performed directly against Dremio and that is doing a count of all the records. And we can see here, there's about 1 billion records that we're going to be working with now. If I want to break this up and maybe get a better idea of what this data really looks like, I might want to see how many writers per day are using taxis in New York City. So I'm going to go ahead and grab this pickup date time. I'm going to go ahead and drag it over here. You can see automatically change this. I'm just going to go ahead and change this over to a different view so we can get a better look of it. And I kind of want to have it a bar graph like this so I just quickly moved around. Very easy to do, very easy to work with.

Brock Griffey:

We can see that this is breaking it out by year. So we have a column, how many riders per year, you can see there. And at this point, if I wanted to go deeper into the data, and if I was using something like a cube or extract, I may be stuck here. I may have to wait for someone to create a new cube that gives me better or deeper granularity into the data, but not with Dremio. With Dremio we can work at the speed of thought. I can come in here and then go to your month. And I can click on this. Instantly, I'm provided with these results. I'm able to work at the speed of thought and I can see at a year, month level, all the breakdown of the data. In fact, I want to go deeper. I'm going to go to the day level. Instantly, I can now see at the day level.

Brock Griffey:

Again, all of these are live queries, performing directly against Dremio using Tableau. If I wanted to publish this dashboard and allow other users say on Tableau Server or Tableau Online, I could go and publish that as well. And they could actually still use live queries directly against Dremio. So I want to show you, how did we get that performance? And what does that look like in Dremio? Coming back over here to the jobs tab, I can see every query that I've ran. I can come down here and I can see the very first one that ran and you can see, it's just doing a sum to get the count of all the records and it's going up through the layers. So we can see each layer we're actually extracting a year and maybe it's doing a year, month and some point. And we're actually able to see each query and how it performed.

Brock Griffey:

And you can see here, the length of the queries, each one of them took less than one second to perform. Maybe asking, how are you getting that performance? How is that possible? Well, in Dremio we have an acceleration technique and you're able to enable this acceleration technique very easily. On a virtual dataset, we can enable, we call a reflection. In this case, we're using an aggregate reflection. View the aggregate reflection, we can quickly turn that on by just clicking the slider and adding any dimensions and measures we want. When we're done, we click save and Dremio will automatically maintain and manage that reflection for you. No user will ever query this directly, it will always be transparent. Any time you query this data set or any data sets built off of this, Dremio will automatically do a match of whether or not the reflection covers the query and give you the acceleration you're looking for.

Brock Griffey:

So, one other thing I want to show you is how do you connect... You may have a question of how do you connect to Dremio from within Tableau itself? So after you've added Dremio through the extensions gallery, you have an option here. You'll see Dremio Connector by Dremio. We can easily open this up, and again, we can search for or we can put in our connection information. So I'm talking to my server here. I'm going to go and log in with my single sign-on. Hopefully I typed my password right. Once you're in there, it would provide you with the ability to view the data from the data source. And you'd be able to browse this data source.

Brock Griffey:

And by browsing the data source, you could see all the different options here, and you could load into the dataset you wanted to have. So we can see here Dremio, we can set the schema. And so here, I'm going look for the same schema. So I'm going to look for the business transportation schema. And I can search here and view all the datasets and easily drag and drop it over. And then I can begin the same bit, all the data that I wanted. If I wanted to join data sets, I could do the same thing. I could add more datasets and join them in here as well. So I can do analytics on multiple datasets joined together. So that concludes the demo portion and what we've just done, as we've reduced the time to insight. The amount of time it took for us to do analytics was significantly reduced compared to your traditional means. Traditionally, we would have had to ingest that data. And on top of ingesting the data, we would have to transform and optimize that data to make it performant. But even then we'd still have to build cubes and extracts to go even further and get that deeper performance that we're looking for. And then finally, we'd be able to run those queries.

Brock Griffey:

In Dremio, right away we were able to read the data directly from the data lake without making cubes or extracts or copies of the data. And we're able to run directly on top of that. For additional performance we were able to add a reflection just by check marking a box and add a couple of dimensions and measures. This all amounts to 100s of times faster time to insight. Now, just to speak a little bit about a customer that I actually work with, NCR began their transition to a modern data analytics platform about two years ago. And when they began this transition, they quickly realized that it took a very long time for them to migrate their data pipelines from their existing platform to the new platform. About two to three months per dataset.

Brock Griffey:

It required many consultants to map legacy schemas into new Tableau dashboards and the amount of time it took left the business unsatisfied. Every time they request a new data set, it took even more time. And it also resulted in poor performance. They weren't able to see that performance directly on their new data analytics platform. In addition to that, they were also looking to possibly connect their legacy warehouse with their new warehouse, and they weren't able to achieve that.

Brock Griffey:

So with Dremio, they're able to drop it in and add using Dremio semantic layer they could connect to both systems at the same time. And through that connection, they're able to accelerate their migration process. By creating a semantic layer that could point to both at the same time, their legacy and their new modern analytics platform they could then query both at the same time and whenever they need to migrate from one to the other, they just changed one line of code, which was transparent to the end user. Meaning Tableau did not need to change anything and no upstream tools need to change anything. They could then all use the same semantic layer out of Dremio and get the performance boost together.

Brock Griffey:

This shortened, their development time from months to days, and they no longer needed these consultants to do all the development and deployment on the data lake. This reduced what they considered data leakage to them was any time that was wasted waiting for analytics. So, one thing I want to leave you guys with is the Dremio Amazon edition. This is a free edition of Dremio that you can try out today for yourself. You can find out more information about it at dremio.com. All of the features or functions I've covered today are available for free and more. And I'm going to go ahead and stop here for questions and answers.

Louise Westoby:

Thanks, Brock and Blair of course, as well. We will go ahead and take some time for questions, be sure to type them into the question box in your control panel. While we're waiting for you to input questions. If you wouldn't mind go to the next slide, I wanted to make sure that you're aware of the upcoming subsurface Cloud data lake conference. We'll have over 30 technical sessions this year, and it's the conferences really geared towards folks like you who are data lake architects, engineers, and developers, and it's very much focused on Cloud data lake topics. Actually the Tableau CPO Francois Ajenstat will be joining Billy Bosworth for a fireside chat to discuss how you can unlock the power of all your data and make analytics accessible to everyone. Billy, of course, is the CEO at Dremio as well as a former board member at Tableau. He's also a current board member at TransUnion. So I think with the two of them up on the stage there should be a very interesting talk.

Louise Westoby:

It looks like we've received a number of questions in the Q and A, if we don't have time to address them at the end, we will follow up after the webinar with you personally. So first question I think for you Brock, can we use Tableau Server or Tableau Online with Dremio?

Brock Griffey:

Thank you. That's a good question. So yes, we can use Tableau Server and Tableau Online with Dremio. You can easily publish your your workbook directly to those and through Tableau Bridge or through a connection if you were on prem to Tableau or to Dremio you can have live queries directly on the data lake.

Louise Westoby:

Okay. Thank you. Blair, next question is for you, when would I use a live connection versus extract?

Blair Hutchinson:

Yeah, I can answer that. And I think that Brock already kind of spoke a little bit to that as to why companies are looking to use live connections. So just a little bit of background. So live connection, obviously you're querying live against your database or your data lake in this instance. An extract is you're actually taking a snapshot for a given moment in time of that data source and using Tableau's data engine hyper to query against that. So I used to actually work as a solution architect when I joined the company five years ago, and a lot of what we recommended was a data extract strategy for our customers, so that they could realize the performance and actually query and explore their data without lag time of having to use live queries.

Blair Hutchinson:

And that has changed a lot. The performance that you're able to gain via live connections with products like Dremio has really changed the way that we have seen our customers use the live and extract approach. We still see kind of a hybrid solution where extracts can be useful one, if they're slow connections, but also if you're not wanting to continue to send queries to the data source for kind of load balancing issues. But more and more we're seeing that companies are seeing the value of querying live and getting those up-to-date responses from queering directly to the database. So it's really a choice that you can make, but the ability to use live queries that are as performing as they would be if you're using Dremio in this instance make the case for using live queries instead of extracts in more cases than not.

Louise Westoby:

Great, great answer. I think the next question is somewhat related, but I just wanted to give Brock and opportunity to talk about this one as well. How is Dremio eliminating the need for extracts and aggregations?

Brock Griffey:

Right. So out of the box Dremio like I mentioned before on a couple of slides ago was we give you all of these advantages of our acceleration engine to give you performance directly on the data lake, without the need for anything so out of the box where give you performance. On top of that, though, we have the ability to use our reflection technology and the reflection technology will enable those BI queries to be much faster. And they enable you to get those deeper analytics while you're rolling up your data and doing your aggregations. So by utilizing that it removes a lot of the need for your cubes and extracts because now we can have that performance without the weight.

Louise Westoby:

Okay. And I think next question is for you as well Brock. Does Tableau support connection to Dremio over Arrow Flight?

Brock Griffey:

Great. So today there is no Arrow Flight connector, but we do hope that in the future, there will be. We do have a SDK connector that will allow you to connect from Tableau directly into Dremio though. So that's an option as well, which I just download as well.

Louise Westoby:

Okay. Next one is somewhat related if I can... sorry, I just lost my place here. Do we need ODBC driver to connect to Dremio?

Brock Griffey:

So that is one of the options you can use the ODBC driver. The SDK uses the JDBC driver and we do have detailed instructions on how to install that as well on our OM docs.

Louise Westoby:

Okay. Blair next one's for you. I don't see Dremio in the list of connectors in Tableau. How do I connect directly from Tableau Desktop?

Blair Hutchinson:

It kind of goes into a little bit of what Brock was just talking about. So the flow is that we have a connector gallery that Brock showed as part of one of his slides. And that experience is actually coming into the desktop product sometime in 2021. So you'll be able to directly from Tableau Desktop explore and find the Dremio connector if you haven't already installed it very easily, if we can Chrome within the product. But the idea is there is that just to kind of reiterate the point is download the connector, download the driver, and then you're off to the races with connecting to Dremio.

Louise Westoby:

Thanks Blair. And next question Brock is for you. Can We use Dremio with Neo4j, MongoDB, Hbase... (no SQL databases)?

Brock Griffey:

Right. So we do have a variety of built-in supporting connectors to various data sources. We have all the data lakes most of them are supported now we do have what we call relational connectors or external connectors. And some of those we have connections to some things like server Postgres some other ones out there as well. Anything that's not there though, we do have an open source connector that you can create your own connection using JDBC file to connect to those other sources that we may not support today. That's DremioHub if you're looking for that and we do encourage you to go out there and build your own connector so you can connect other their various data sources.

Louise Westoby:

All right. Great. Thanks Brock. What else do we have here? Okay. Quite a technical question here about reflections. You mentioned that Dremio scales over multiple query nodes. How does it work with reflections? Are they shared among query nodes or does each node have its own copy?

Brock Griffey:

Good question. So the way a reflection works is we actually configure, we call it reflection storage or distributed storage, and that can be stored back onto your data lake. So say we have an S3 data source when we create a reflection, it'll put the reflection information out into the data lake, which is then accessible by any query engine. So it actually will scale across all of your nodes. That way we get the performance of the data Lake and the performance of every Dremio node that we have available to us.

Louise Westoby:

Okay. Next question, do I need to run Dremio in the same Cloud as my data lake?

Brock Griffey:

Another good question. No, you do not. You will get a little bit of latency just because going between for VPCs internet, stuff like that, and traffic, you may get a little latency, but you will be able to connect those sources. They don't need to be in the same location.

Louise Westoby:

Okay. Sort of somewhat related. Can I run Dremio on-prem assist?

Brock Griffey:

Yes, you can. Today we have several customers that do run on-prem. Dremio is kind of agnostic. You can deploy it, a variety of places. You can deploy it on-Prem through bare metal virtual machines or bare metal hardware. You can also deploy it into Hadoop directly using YARN. You can deploy on Kubernetes using our home charts. We have Azure and AWS offerings as well. So there's a variety of places to deploy in different ways to deploy.

Louise Westoby:

All right. I'm not quite sure if might be a joint question here for both of you, but I'll ask it. How to use Dremio queries from Tableau Prep. I don't know, Brock or Blair.

Blair Hutchinson:

I think I can answer the first one. At the moment. We do not have a Dremio connector for Tableau Prep. That is something again, that's coming in 2021. So at the moment, the answer is it's not possible, but shortly we expect there to be available to customers. I don't know, Brock, if you had anything else to add there.

Brock Griffey:

No, I have nothing else to add there. Thank you.

Blair Hutchinson:

Yeah.

Louise Westoby:

Thanks for that. Next question is for Brock. If you want to query a dataset on several different fields, is it better to create many reflections? How do you minimize the data for print of these very flexible reflections?

Brock Griffey:

Right. Without seeing the dataset, it's hard to go and explain exactly what you need to do, but if you have various fields that you're trying to pull in, one option is to pull them all into one virtual data set, right? And instead the one virtual data set, if you have joint technologies together, you can easily create one reflection in that virtual data set that covers the queries you're going to be using. It's, more of looking at the queries coming in to understand what the query pattern is and once the query pattern, building that reflection.

Louise Westoby:

Okay. Next question. So also for you, lots of Dremio questions here. What is Dremio's recommendation as far as the amount of data preparation that needs to be done in the data lake to prepare for performance querying? Can you query directly against data in the raw data layer?

Brock Griffey:

Good question. It's really up to you. You can certainly query the data in the raw layer right away. If you want to get the data and work with it right away, that's an easy way of doing that. If you want to build more of a semantic layer, which is one of our best practices when you're using Dremio with many groups or with an entire organization, we do recommend building out a semantic layer that has more of a common view of how the data should be used throughout the organization. So it's really dependent on you and your use case, but you can certainly query the data directly without any virtual datasets.

Louise Westoby:

Yeah. Next question. Probably want to answer this both for Dremio and for Tableau. I'm guessing it's a Dremio question, but I think it's good to know on Tableau as well. But the question is, do you have a free trial for Azure?

Brock Griffey:

So I'll go ahead and answer real quick. I'll let Blair answer. So on Dremio there's actually the Dremio Community edition that you can get. And there's also a Azure edition that you can get as well and through the Azure Marketplace, and they're both free trials, you can get them and use them for as long you want.

Blair Hutchinson:

So on the Tableau side, we have a two week trial for any of our products and you can download those directly from our website. And when we have docs that go in great detail about how to deploy Tableau's server on Azure as well. I just also like to make one more comment to the Tableau Prep question that was asked earlier. I'm not able to see those questions to reach out directly to the person that asked that, but I would be curious about your use case. So if you would reach out to me, I would love to talk with you about Dremio in Tableau Prep and how you're thinking of using it in that example.

Louise Westoby:

Blair, do you want to provide your email just so that that person can reach out to you?

Blair Hutchinson:

I'd say LinkedIn is a great place. Blair Hutchinson would probably be the easiest way to find me. And if there's up too, I can share my email as well.

Louise Westoby:

Okay. All right. So just about, at a time here, just one last question. And as I mentioned, we've obviously got more questions that's time, so we'll make sure we get to the rest of them in one-on-one follow up. The last question here is for Brock. Do you have any suggestions on best practices when adding new fields to an existing data source/view with reflections?

Brock Griffey:

Right. When you do this, you'll see the reflection will say that you need to update it. When you go into the reflection, you'll see that there's an ability to just check mark a new box and it will refresh your reflection once you save that. There's not really too much to do after that. If you're adding new fields, we're taking away fields, you'll see that it says, "The dataset may have changed and maybe invalid," but again, if you just go into the reflection piece you'll see that it has a disclaimer saying that, "Hey, you need to update this or something may have changed, please check it." Dremio will do pattern matching. So it's not like your data's going to be incorrect when you query because Dremio will actually go and realize that the reflection does not cover the query so that most, you may see the query take a little bit longer to run, but still get good performance. So it's one of those things that you just have to check and make sure that when you change something has reflection covering that still.

Louise Westoby:

All right. So that's all the time we have for today. A big thank you to our speakers, Brock and Blair, and thank you to everyone who attended. We appreciate you being here and look forward to seeing you next time.

Here Are Some Resources to Help Get You Started

Need Some Help?