Dremio Jekyll

How Dremio Beats AWS Athena

Transcript

Louise Westoby Welcome everyone. Thank you for joining us for our webinar series the Cloud Data Lake Query Engine Showdown. Today we'll be discussing Dremio vs AWS Athena. My name is Louise Westoby and I'm a director of product marketing here at Dremio. Before I introduce you to our speaker I'd like to go over a little housekeeping. First of all the webinar will be recorded and sent to you. If you have any questions during the presentation, please type them in the question box at any time. We will address the questions live during the Q and A session at the end of today's presentation. With that I'd like to introduce our speaker Serge Leontiev, who is responsible for technical product marketing here at Dremio. Serge, over to you
Serge Leontiev Thank you Louise. Welcome to our webinar series. Today we're staring our series where we're going to talk about different flavors of Presto and how this end up versus Dremio data lake query engine. Thank you for joining us, and today this particular session dedicated to how Dremio beats AWS Athena, so we're going to talk about AWS Athena's specific benchmarking results. This is going to be a deeper dive into Athena's specific performance and cost efficiency, and how it compares to Dremio. Today we will discuss a side by side comparison of Dremio versus Athena. We will talk about benchmarking methodology that we used, and finally we will highlight architectural reasons and technical factors that drive the different in outcomes
Serge Leontiev Before we proceed, please allow me to introduce myself. My name is Serge Leontiev and I'm leading technical product marketing at the Dremio Data Lake Engine company. So currently we are powering the Data Lake Engine for world leading companies across all industries. So our company provides product data lake query engine solution that offers interactive query speeds and unprecedented performance directly on cloud data lake storage. And as we discussed today's agenda is going to be very straight forward. We're going to talk why would you do benchmarking. We're going to talk about execution costs and performance comparison between query engines. We're going to talk about execution time by specific query type. We will address architectural analysis, and then we will go through a Q and A at the end of this call
Serge Leontiev So why benchmarking? The key requirements that currently the data teams are facing today, the first of all is imperative to provide an interactive query performance, directly on cloud data lake storage for business users and analysts. Basically it allows our time expense and complexity of moving and copying data into traditional warehouse solution. And secondly, what we're trying to achieve with it, basically the data teams are facing, it's to achieve the cost efficiency for a given level of performance. This is one of the emerging key decision criteria for data teams, as they evaluate any cloud based solution. With the elasticity of the clouds it means that absolute performance alone is no longer the right way to benchmark solution. Simply because any scale out architecture can add more compute resources to deliver [inaudible 00:03:45] performance level. It is more important to understand rather the efficiency of a given solution, what amount of computing resources are required to deliver a given level of performance
Serge Leontiev And with that in mind, the top three performance and cross comparison matters most are, it's basically BI and reporting query performance and cost efficiency, interactive ad hoc query performance and average performance and cost efficiency across query type. And Dremio basically delivers great numbers compared to AWS Athene in this particular case. And that's exactly what we're going to present you today. So as we discussed earlier Dremio and AWS Athene basically they are falling in the same category of data query engines. And our goal was to compare all offerings side by side, so we did this benchmarking at the beginning of February 2020, and we compared Dremio versus different flavors of Presto. PrestoDB, PrestoSQL, Starburst Enterprise Presto and AWS Athena
Serge Leontiev Quick history fact, the Presto initially was created as an internal Facebook project in 2012 to replace slow 300 meta bytes hive data warehouse, and later it was open sourced in 2013. And in 2016, AWS rolled out Athena service basically the same capabilities as Presto, and it's basically based on Presto. However certain features and functions are missing. We leveraged AWS marketplace offerings and service offerings as much as we could. Our ultimate goal was that others like you could easily reproduce our test results without spending time on setting up, configuring, and provisions full out infrastructure
Serge Leontiev So for this particular exercise what we deployed, so we provisioned Dremio AWS edition through AWS marketplace offering. So you can go to AWS marketplace, search for Dremio and you will find those offerings available that you can deploy directly into your EC2 environment. So it was configured with default settings, so we haven't performed any special tunings. It was a vanilla instillation. We selected EC2 instance types M5d.8xlarge for all our benchmarking tests. However we Athena begin a server less offering, it's impossible to compare apples to apples. No one really knows what resources are allocated there, and we had no control over it. And the provisioning of Athena is a simple kind of click and adding permissions to use Athena as a service in EC2 instance, so it was a very easy and seamless process as well
Serge Leontiev As I mentioned before AWS Athena is based on Presto, actually PrestoDB flavor, that Facebook could base out of Presto, that allows executing data interactive queries on data lake directly from AWS S3 storage using SQL. And Athena uses the AWS glue catalog to store and mutate table meta-data information. So let's talk a bit about the benchmarking methodology and tools that we used during this exercise. We chose TPC-DS as a trusted investment standard benchmark for journal purpose decision support system. It's geared toward online analytical performance benchmarking, and offer wide variety of BI reporting analytical and ad hoc queries that represents typical analytical work loads
Serge Leontiev We used TPC-DS provided tools to generate data sets and queries for this benchmarking test. We used the different scale factors such as scale factor 1,000, which is approximately one terabyte of data and scale factor 10,000 which is approximately 10 terabyte of data to test the linear scalability of the engines at different scale factor of the data. The generated data sets were converted to Apache file formate as the most persuasive open source columnar in big data analytics. Currently it's broadly supported by industry and including technologies such as Dremio, Presto, Spark et cetera. We actually obviously wanted our results to be repeatable, and with that we have identified 58 unmodified TPC-DS queries that we were able to execute across Dremio and Presto engines. This subset of queries actually equally represents BI analytical and article queries. While we can extend basically query coverage and do 100% with query rights we decided not to do so. Our role was to make it easier for you guys to replicate everything with the standard queries
Serge Leontiev We tested Dremio linear scalability by incrementing node counts by four, so we started at four nodes and we moved to eight 12, 16 and 20 nodes cluster. And we used Apache JMeter as our test suite, since it offers an open and flexible framework that can be easily leveraged. Any EA JDBC driver and it actually provides a nice and transparent and easily digestible test results. Like I mentioned earlier, so we have identified 58 TPC-DS queries that we were able to execute across all Dremio and Presto without any syntax or query modifications, and actually to do that we did an initial round of scale factor one, which is approximately one gigabyte of data. It's tiny, small amount of data. However, at larger scale, Athena simply failed to execute the full set of queries
Serge Leontiev On scale factor 1,000 on 50 queries out of 58 were executed, and the rest of the queries basically failed with errors like query exhausted resources at this scale factor, and query timed out. Which tells me that basically even though Athena is server less and auto scaled kind of a service that you don't really have to worry about what's behind the hood, unfortunately it fails to handle the large data sets. And actually at scale factor, at even larger scale, scale factor 10,000 Athena were able to execute just 32 queries out of 58 queries, and basically throwing exactly the same errors like I mentioned before. And this actually makes Athena very unreliable I would say on the large data sets. It's hard to predict which query will be executed successfully or not. And with that in mind, let's compare side by side the execution costs and performance of Dremio and AWS Athena query engines based on average query execution time with default ram acceleration
Serge Leontiev So we have multiple acceleration techniques, I'm going to talk about it later. So for this particular exercise we haven't enabled any additional [inaudible 00:12:17] performance for Dremio. So we have calculated the average number of queries that each engine was able to execute in one minute at scale factor 1,000, so one terabyte. Obviously Athena we don't really know how many nodes there, right? So we did calculation on whatever we were able to get as a result, and for Dremio we did calculation on different node counts. So basically what we did, we divided the number of successfully processed queries, in this particular case 60 queries on the total execution time to average the results
Serge Leontiev And as you can see from this particular graph, Dremio was able to execute more queries per minute than Athena. Even in a small 8 node cluster. And it was nicely and linearly scaling up to 16 nodes, and then it's basically Dremio reached out a maximum parallelization on this scale, and stopped being stressed by the size of the data set. So you can see really it conforms with this curve. However it's just simply because the data set is not that big for the Dremio. So again, looking at this particular graph you can easily tell that Athena cannot not even reach similar to Dremio performance at all. And we did exactly the same calculation on 10 terabytes scale, at scale 10,000. And we used again to measure the number of queries completed in 10 minutes rather than 1 minute, given that it's a 10 times larger scale
Serge Leontiev And this graph basically shows that the even bigger gap in linear performance and the bigger scale factor. So at larger scale the Dremio engines continue linearly scales at 20 nodes, so it's nice linear performance given the bigger size of the cluster. And the larger data sets query shows that if you give more data to Dremio it drives better performance out of Dremio powerful engines. And once again, Athena simply failed to achieve similar performance that Dremio were able to show
Serge Leontiev So now let's compare query execution time and the total cost of execution at different scale factor. And again, for Athena we don't really know how many nodes are there, so I just kind of put this line for Athena and mapped the results for 8, 12 and 16 nodes, so you can easily compare the Dremio versus Athena. And again it's a shared service and actually what we noticed that, yes it's a shared service and execution time may depend on the time of the day or day of the week. There are certain best practices that we followed, and basically whenever we ran these tests on AWS Athene we were trying to execute those tests during off peak hours. It's mainly evenings and weekends, so at least to get maximum resource allocation for our tests
Serge Leontiev So again, in our case at scale factor 1,000 at one terabyte, it actually took four hours for Athena to execute all 58 queries. And if you take a look on the graph, you will see that actually the successful total query execution time was 50 minutes. So this means that Athena spent three hours to process failed queries. So from my point of view this is kind of amazing, right? However they're not charging you for that, so they only charge you for successful queries, and in this case the total cost for those 50 queries was $80.95. And that's actually 3.5 times more expensive and 5 times slower than Dremio 8 node engine. And even at 16 nodes, Dremio is still 2.5 times cheaper and 7 times faster than AWS Athena
Serge Leontiev So you can actual draw run Dremio on the same scale factor for less you can get much, much better performance. Scale factor 10,000, Athena didn't do as well. So as I mentioned before, only 32 queries out of 58 were completed successfully, and this test actually took almost 9 hours to complete compared to 50 minutes for 12 node Dremio, for the same queries. At a total execution time for 32 successful queries on Athena it took 4 hours compared to 70 to 40 minutes on Dremio for the same number of queries. And Athena basically failed to deliver adequately comparable performance. And again, so if you [inaudible 00:17:56] Dremio 8 node engine is up to 6.5 times faster than Athena, at 1 and a half the cost
Serge Leontiev So with that, let's dive in and explore Dremio and Athena performance for specific query types. There are a few distinct query types that are often used by analysts, and that basically matter most for data analyst teams. So we measure query performance and cost comparison for exactly those query types, but before BI reporting queries, analytical queries and ad hoc queries. We see a lot of our customers are using BI dashboard queries for their dashboard utilization, and quite often it takes more BI queries for a customer to execute rather than ad hoc queries. But TPC-DS did not equal number of queries give us quite and clear picture of how the engines perform. But before we proceed let me cover granular acceleration technology
Serge Leontiev I quickly mentioned before that we used Dremio forward acceleration, so let me a little bit elaborate what that means. This is important to understand, because in the next sections we're going to compare results based on default and advanced acceleration. So basically Dremio offer end to end acceleration starting from the data storage itself. So we use cloud data lake optimized massively parallel high performance reader to easily and fastly extract and digest data. We use in real time distribute NVMe based caching called C3, a columnar cloud cache. We also offering advanced acceleration through transparent materialized views with high granular reused pattern called data reflections. Then we using distributed elastic retroized execution model based on Apache Arrow for columnar in memory analytics, and that leverage is Gandiva is LLVM based execution columnar
Serge Leontiev And also we are offering Arrow Flight RPC interface, a high speed distributed product that offers a thousand times faster output between Dremio and Flight applications. And basically it's build to replace decades old JDBC and ODBC protocols. So a data reflections that's basically what we call advanced acceleration, so by default massive parallel readers, C3 Apache Arrow are there. We haven't tested the Arrow Flight in this particular exercise, so we're relying on the JDBC obviously. However yes we built some data reflections for subset of queries to show how even better your performance could be with the advanced acceleration
Serge Leontiev So in this section we're actually going to talk about where acceleration is Dremio data reflections. This is a patented feature of Dremio, and it's not an able body [inaudible 00:21:40] like I mentioned before. It requires some configuration, and for this exercise we have identified a set of TPC-DS queries that can be optimized by column deflections. This means that we're not building reflection per query, we're building one reflection but basically covering a variety of queries. And then we compared results with data that we already collected, and with that we see performance improvement for BI and reporting queries for Dremio versus Athena up to 2,500 times, so it's a blazing fast performance for BI queries. And up to 184 faster for ad hoc queries
Serge Leontiev So let's take a look on those particular query type. This graph basically represents the query execution time for BI and reporting queries with data reflections at scale factor 1,000. So as you can see from this particular graph, the execution times goes from minutes to seconds. So the time on this graph is basically reflected in milliseconds, so 656,000 so approximately 10 minutes. So instead of 10 minutes for query 51 we were able to view our results in 10 seconds. Query three is a one second, query 7 1.3 seconds and so on and so on. So it's basically giving a performance boost up to 65 times compared to the Athena. And at even bigger scale factor, at larger scale the execution times goes from 40 minutes to seconds
Serge Leontiev So as you can see query 62 it's a 2,622,000 it's milliseconds which is approximately 40 minutes, and with reflections we were able to optimize this query and run it within 1 second. So it's up to 2,500 times faster data retrieval compared to AWS Athena. So now let's take a look at ad hoc query performance again with data reflections. And as you can see, again we can improve execution time from minutes to sub seconds. So the last query in this particular diagram the graphs are query 99, it goes from 164 seconds to a sub second. The query 42 again from 20 seconds to sub second. So which gives up to 160 times faster query execution time compared to AWS Athena. And the at larger scale, again with data reflection you can see that again the execution time can be greatly improved from hours to seconds
Serge Leontiev And the number of queries are different obviously for each and every one of those graphs, simply because like I mentioned before, Athena failed to execute a lot of queries at larger scale factor. So basically we have a smaller subset to compare. So as you can see Dremio truly allows you to greatly improve query performance, and offers to save in infrastructure costs by leverages advanced acceleration technologies. So instead of waiting for minutes or hours, you can get your results in seconds, so you don't have to keep your infrastructure up and running you can purchase credits. But even with default acceleration, Dremio actually does offer excellent performance out of the box. So let's review the execution time results by the same query types next
Serge Leontiev This graph basically represents the query execution time of the BI query for Athena and Dremio again on the different node counts. And like I mentioned before, we don't really know how many nodes for Athena for that purpose I put everything in one line for AWS Athena for this simple comparison. And with that as you can see, even with four nodes Dremio delivers up to 6 times faster performance, and returns results actually under 7 seconds while it takes almost more than 40 seconds for Athena to return the same query results. Dremio again achieved optimal performance at 16 nodes, and just would like to arrange for size in that. And with 16 nodes Dremio was almost 16 times faster than Athena. And yes while perfectly linear performance is not seen there, it's due to the fact that this query has a low run time. The fixed costs of the running query takes a larger proportion of the over query run time, so the end maximum prioritization was achieved based data structures. So it's basically adding conditional nodes doesn't make sense. So at 16 nodes we've already got our maximum performance
Serge Leontiev And the same query actually failed on scale factor 10,000. So we selected the next best query based on execution time, and then we compared again that to Dremio, at query 73. And as you can see, on 4 node cluster, Dremio returned results for this particular query within 62 seconds while it took almost 3 minutes for Athena. On 20 nodes Dremio was 11 times faster, and it was returning results in 14 seconds for the BI queries. And beyond 20 nodes Dremio still scaling linear maintaining the same gap in performance
Serge Leontiev And if we will take a look at ad hoc query. At scale factor 1,000 so we selected query 42 from whatever we can. The Dremio again was on the 4 node cluster, Dremio was 3.4 times faster than Athena. Again achieved optimal performance at 16 nodes, and with 16 nodes Dremio returned results within 2 seconds compared to 20 seconds on AWS Athena. And again, we couldn't select the same query, so at the scale factor 10,000 we picked the query 96 based on the best performance, best run. And again on 4 node cluster, Dremio returned results under one minute versus almost three minutes for Athena. For ad hoc can you image sitting and waiting for three minutes for your query results to come back? And then on 20 nodes Dremio showed a 12.6 times faster performance, returning results just in 13 seconds. And beyond 20 nodes the Dremio still continued to linear scale and maintaining the same gap in execution time
Serge Leontiev So let's finally talk about architectural differences that's why Dremio basically can dominate let's say over Athena at any scale. Like I mentioned before, the Athena is basically based on Presto. So Athena running PrestoDB under the hood and then using the Presto code base while it's still a server less offering. And the major differentiators are basically showed in this table. So Dremio execution model is build on Apache Arrow, so we use columnar memory execution model. So whenever we're reading data in columnar format example from Apache Parquet we are processing it in columnar format. On contrary, Athena and Presto they're using row based execution model. So what happened whenever they're reading data from a columnar data formats, they're converting it to row based format in order to process it. And obviously the data conversion takes time and that's basically why we see a gap in performance for sure
Serge Leontiev Execution model, so execution architecture is different so Dremio obviously offers you this option, the notion of multi-engine. So what does it mean that you can create the engines for different workload types, and then Dremio will start those engines to just get created for this particular type of workload, execute it and shut it down. And Athena and Presto obviously do not support that notion, so it's always one cluster, one engine. So they will not be able to isolate workloads and some of the different engines. The runtime, so the Dremio basically runs on the native core, while leveraging like I mentioned before Gandiva LLVM, so we're taking full advantage of the native capabilities of the processor, while Athena will be using Java Runtime. And so Java Runtime would take additional time to process execution
Serge Leontiev Dremio offers NVMe cache technology into C3, like I mentioned before. That basically allows us to pre cache some of the data on the executer nodes and then if query kind of returns the same data sets we can just read that from the cache, and Athena and PrestoDB does not offer that function, not at all. Query acceleration technologies, obviously yes Dremio offers several acceleration options and advance acceleration of this data reflections, which is a materialized view. And Athena does not have that feature at all. And like I showed in this particular presentation, you saw that Dremio drastically improved performance, query execution performance up to thousands of times and returned data in sub seconds, versus minutes or hours
Serge Leontiev And finally the biggest differentiator would be the cloud data lake optimized readers, so a predictive pipelines and asynchronous readers allow us to read data at scale in parallel in very fast manner. While Athena is basically relying on the hive connectors or something like that to get access to the data. And obviously they will not be able to achieve exactly the same level of performance, that Dremio are offering with our data lake query engine. So at this time we are reached the end of this presentation, and we're ready for Q and A section
Louise Westoby Okay, thanks Serge. So as said we'll go ahead and take some time for questions now. Please be sure to type them into the question box in your control panel. While we're queuing up the questions I want to make sure that you're aware that we will be making our benchmarking report available to all of today's webinar attendees. So we'll be sending out a follow up email with the recording of the webinar as well as your personal link to that benchmark report. We also have two more webinars in this series, the next on is tomorrow, Dremio versus PrestoDB. And the following one is the next day on September 10th, Dremio versus Presto SQL. So with that, I think we've received a number of question. You want to move to the next slide Serge?
Serge Leontiev There you go
Louise Westoby And we'll go ahead and tackle some of these. So the first one is, why did you benchmark against Athena as we don't know the exact number of executer nodes that they are launching?
Serge Leontiev Yes, the AWS Athena is actually very popular offerings, and we see a lot of customers who are leveraging that for their business analytical needs. So for us it's very important to highlight the differences and highlight differences in performance. Yes obviously Athena is a server less offering and we don't really know what's underneath the hood, but based on the performance and the query execution time, query performance and the cost analysis we can compare at least the cost efficiency of the engine
Louise Westoby Okay, great question. Thanks Serge. Next question, if data lake files are already in Parquet format, what is the need for creating reflections, isn't it just duplicating the data?
Serge Leontiev The reflection actually it does duplicate data but not everything, so basically think about reflection as materialized view, right? So whenever you would like to speed up the process. For example, if you have billions of records in your data set, that's actually a real world example of TPC-DS data set, you'd have to go and iterate through the whole data set to kind of select the data that you're looking for. Data reflections basically allow you to create a subset representation, basically subset representation of that particular data set, and your execution time will be less because you're not kind of going through the whole data set to find the data that you're looking for. And I mentioned before, so there's best practices that you should follow when you're creating reflections. It does not mean that you just need to upload only data that you always kind of selecting for one particular place. So ideally the reflection, there is several different types of reflections that you can create, so [inaudible 00:37:26] reflection or aggregated reflation. Aggregated reflections are a perfect example when you can store just aggregated values for example
Serge Leontiev And Dremio taking care of keeping your reflections in sync. So we have processes that's basically refreshing reflections behind the scene, and from the execution point of view, you don't really need to worry where the data is coming from, so from an end user perspective they will just execute the same select statement. And if Dremio engine could find a subset stored in the reflection it will use reflection, basically. And that's basically what drives additional performance. To your point, it's not exactly a duplication of data, right? So it's kind of creating materialized view of your data sets to speed up BI and analytical queries
Louise Westoby All right next question. What is the programming model against Dremio? Can I use a JDBC Java client to trigger SQL queries?
Serge Leontiev Absolutely. So we are providing JDBC and ODBC drivers. You can find them on our website, and that basically could be in any Java application or any client application. Like I mentioned before, I used Apache JMeter, so Apache JMeter and it's open source tool for executing different type of tests. And in this particular test I just dropped a JDBC driver in the leap folder, I provided this JDBC connection string URL and that's it. So it was able to pick it up and be able to connect to Dremio and start using that queries
Louise Westoby Okay, next question. In what cases would you say that Athena is better than Dremio?
Serge Leontiev Yeah it's a great question. So for example, if you don't really kind of get into real BI or analytical cases. If you're for example I don't know dev ops or the admin who'd like to kind of go and run some queries against your local files or something like that, then Athena probably would satisfy your kind of requirements. However, for the real kind of actual enterprises, the real use cases when you need performance, when you need kind of consistent performance, Athena is not build for that. So I mean one, two queries a month or a week, Athena perfect for that. If you'd like to run a lot of business reports, BI, analytical ad hoc queries and get a consistent performance, Athena is not good for that
Louise Westoby Okay. Next question. Do you recommend running a single node Dremio deployment for production?
Serge Leontiev No. So how Dremio works, so the Dremio basically parallelizing the work load between multiple executive nodes, right? So we're splitting data sets and allowing multiple executer to process them and then basically return results to coordinator nodes and to the client. From my tests, so I see the great performance improvement at 4 node radial cluster, right? 2 node radial cluster will do, will work. 1 node radial cluster doesn't make any sense, you're not splitting anything so basically you're just reading it parsing it and sending it back to your coordinator to apply it. And so you're not taking full advantage of Dremio. So I would recommend to start with 4 nodes and go up as you need it. And that's basically why I did these particular benchmarks, to show you how you can save and how you can achieve maximum performance. And to show you how much you'll spend basically on running Dremio. And again, if you kind of go back to the slides you will see that there is a huge difference in the cost
Louise Westoby Okay. What are the results at scale factor one? We are a smaller shop
Serge Leontiev I haven't analyzed those results, so I did this a very basic runs to see how it behaves. And it's one gigabyte of data, so if you're a smaller shop I doubt that you're going to have just one gigabyte of data stored in your cluster, in your data lake. So the Dremio cluster [inaudible 00:42:11] just one gigabyte of data doesn't make sense, so I haven't analyzed those, and I can always go back and take a look at that. Or you can basically provision and by yourself if you go to the marketplace, [inaudible 00:42:47] marketplace get Dremio, and you can easily generate those the test data sets for yourself. I recently published a blog, again if you go to Dremio.com there is blog sections, and I published our benchmarking methodology. And I've outlined basically how you can generate those data sets for yourself, so you can generate a scale factor one data set and get those tests. The performance would be blazing fast. It's seconds, sub seconds because this is such a small data set
Louise Westoby Okay. Next question, you might need to take this one offline but I'll read it to you just in case, it's a bit of a tricky one so we might need to look into the numbers. But the question is, I see Dremio uses the R4X large instances that will cost close to $1 an hour, if I add 16 nodes for optimal performance that'd cost us 20 times the cost of Athena. Is Athena's charge only per query execution whereas Dremio will have hosted ECT that needs to be paid for 24 hours? So I'm not sure if you can kind of dig into the numbers there, or if you'd rather take that one offline
Serge Leontiev Yeah. Yeah, let's start from the beginning. So you don't really have to use RAM on R4 instances, right? So from my kind of experience the M5T are great and sufficient enough. You can get a bigger memory allocation you can use instead of apex large you can get the bigger instance for M5T, which is give you extra memory on your instance. And once again, so if we are talking about Dremio AWS deployment, and the self managed Dremio that's a completely different story, however the patterns is the same. So with AWS I did tests on C5 and R instances as well. Obviously yes, R is memory occupies instances. And actually on Thursday, on 10th I'm going to talk about Starburst. And I run Starburst on R and Dremio and M5T and Dremio did faster on M5T compared to Starburst. So anyway, so when you're looking at the execution cost and the performance cost, so like I said, M5T is sufficient enough
Serge Leontiev If you'd like to go with R, fine, but keep in mind that we are offering elastic engines offerings. So you can create your elastic engines and based on workload, the engine will provision EC instances for you. Keep them up and running until your queries are executed, and when you're done you can set a time out period, I don't know 10 minutes, 15 minutes 5 minutes. And if there's no activity, it will just shut them down. So you don't have to run these R instances 24/7. And if you're asking should I run R instances for my coordinator node instead of executor nodes, I don't believe you will achieve much on that. So the majority of work will be on executors anyway. So hopefully I've been able to answer your question, if not feel free to reach out to us at hello@dremio.com and we can take it offline
Louise Westoby Okay. Next question, can Dremio directly access AWS Glue catalog similar to AWS Athena?
Serge Leontiev Absolutely, yes. With , so Dremio we added this particular feature widely requested. Anticipated feature by some of our customers where we offered the options to leverage AWS Glue catalog and access Glue catalog directly from Dremio, yes
Louise Westoby Next question, how does Dremio work with Avro data format?
Serge Leontiev If you go to our recommendations, so there is a list of formats. I'm not sure for 100% if we are supporting it or not. I would ask you to refer to our recommendation which tells you what data formats we are supporting out of the box. So I will not be able to answer this question
Louise Westoby And that URL is docs.dremio.com, so super easy to find
Serge Leontiev Super easy to find
Louise Westoby All right I think that's all of our questions. I'm just going to give it 10 more seconds to see if any more questions come in at all. All right I don't see anybody typing, so I think that's all for today. Serge if you just want to go to the last slide, I just want to thank everybody for joining us today. We hope we see you on our next webinar, whether it's tomorrow's next webinar that's in this series or our future webinar. Thank you and have a great day.