Gnarly Data Waves Episode
Overview of Dremio’s Data Lakehouse
On our 1st episode of Gnarly Data Waves, Read Maloney provides an Overview of Getting Started with Dremio's Data Lakehouse and showcase Dremio Use Cases advantages.
Learn moreCUSTOMER STORY
Dremio provides DATEV product managers with self-service access to software usage data to help them understand and improve their software and customer experience.
DATEV eG, a German software company with over 8,000 employees and an annual revenue of over 1 billion euros, makes software for tax consultants, lawyers and accounting offices. Tax consultants throughout Germany use DATEV software to do tax, bookings and payroll accounting for their small to medium-sized business clients. More than two million German companies use DATEV software for financial accounting and more than 11 million paychecks are processed per month with DATEV software.
A few years ago, DATEV launched a project to track usage of their software products to improve software quality. The goal was to understand the features that clients were and weren’t using and examine performance and failures. Product managers and technical developers wanted to see if performance and functions were being used as they were designed to be used.
The data they collect from their customers has become increasingly useful to multiple parts of the business. At the end of 2019, the volume of this data increased by 10x to reach 300 million records/day. At the same time, the number of employees interested in using this data to improve product development and management increased by 4x.
But the data management team was unable to respond to the increasing volume and demand because of a lack of manpower and expertise to generate the requested reports. The team provided business units with standard reports, but if an analyst wanted more detail on specific figures, they couldn’t deliver. The underlying Hortonworks data platform and QlikView analytics solution were too slow and inflexible to support the business units in a timely manner. It would typically take the data team two weeks to complete requests from the business units.
To address these challenges, the data management team decided to try a self-service approach so business units could apply their specific product expertise while doing their own analysis independently. DATEV first tried Datameer, Qlik Sense and Elasticsearch but had issues with mediocre performance, user acceptance and data volume limitations, and soon realized that they had to invest more in their data access layer.
DATEV heard about Dremio at a 2019 conference and launched a proof of concept in July 2019. They chose Dremio because it was easy to install, lightweight and had a slick, browser-based UI. “The whole approach with Dremio software was very convincing,” says Matthias Mueller, DATEV senior architect. “You don’t have to install anything on the client side, you can use browsers. We looked at software from competitors but you had to buy it and make appointments to use it. You can download the Dremio Community version instantly and get started. You can get results immediately.”
DATEV easily integrated Dremio into their existing onpremises data lake environment. They use Dremio functions such as the semantic layer, the easy-to-use SQL-based query language, and the catalog to store metadata. To optimize performance for access with Tableau, they started adding Dremio data reflections for data that is used in the interactive dashboards. They were able to keep the existing Spark-based ETL process and store the data as ORC files that are mounted as external tables in Hive. Those Hive tables and some additional CSV files stored in the data lake are connected in Dremio.
In the past, business users had to endure long waits to receive datasets from the data management team. This hampered their ability to make timely product development decisions. Now with Dremio, the business units can leverage the semantic layer for self-service analytics, making queries based on their individual requirements, needs and schedule. Business users can filter the records of a product that they are responsible for, and aggregate the data so they can build KPIs based on that product. For example, one product may only require analysis of the number of times the product is started, while other products may require a query to identify if a certain sequence of actions was triggered to determine if the product is being used in a way it was designed for. They are also able to do SQL transformations with Dremio.
After completing queries, they use the data to connect to Tableau dashboards or Microsoft Excel to visualize the data. As a result, business units now have fast access to detailed planning data, such as what program features should get extended, which should be removed or what bug should get fixed promptly. They can also break down reasons for poor program performance based on operating versions or local data volume size, and optimize the software for a better customer experience
One of DATEV’s challenges was the amount of data they needed to ingest and process — about 300 million datasets/ rows per day covering a period of two years. With the increasing scale of data, they were not able to analyze all the data using QlikView, so they had to reduce the volume to only 30 days. But one month of historical data was not sufficient for the analyses they needed to do. With Dremio, analysts can now run reports with historical data going back several years, giving them flexibility to do detailed and sophisticated analyses to improve their products.
Before Dremio, the data science team produced standard reports for the business units. If a business user had a custom request, it would typically take two weeks to implement. With Dremio, business users can run their own custom queries and analyses in one to two days. “Now business units can do their own analysis in a much faster way and we, as a central team, are no longer the bottleneck,” says Mueller. “Based on those features we will establish new KPIs and dashboards, increasing our capabilities as a data-driven business.”
Mueller explains that Dremio’s ability to translate technical language coming from the raw data to business language helps business users understand what is happening and navigate the software more easily. “This was not possible before, data was only readable to a programmer. This translation makes it very easy for business users so they can understand what is going on,” he says.
This fast access to custom data also helps product managers and technical developers to create better software for their customers. “Two years ago, the product owners had to make their best guess about how bad or well the products were doing in the field. Now we have real numbers from our users about how many are using specific products or features. We can make better decisions about what software to discontinue or invest in,” Mueller says.
Mueller explains that DATEV’s analysis on program usage during the COVID-19 pandemic and shelter in place provides some potential insights into the broader economy. “Our current analysis on program usage across Germany enables us to make some predictions about economic impacts of the pandemic. Since DATEV software manages payroll and income taxes, they can see changes in the usage of those functions such as fewer employees being paid or payroll-specific functions used for reduced payment to employees, which may reflect the health of the broader economy. Dremio’s ability to provide fast responses to custom queries helps DATEV gain novel insights into the state of the German economy.
On our 1st episode of Gnarly Data Waves, Read Maloney provides an Overview of Getting Started with Dremio's Data Lakehouse and showcase Dremio Use Cases advantages.
Learn moreA SQL data lakehouse uses SQL commands to query cloud data lake storage, simplifying data access and governance for both BI and data science.
Learn moreDownload this white paper to get a step-by-step roadmap for adopting Dremio and migrating workloads while maintaining coexistence and interoperability with existing systems and technologies.
Learn more