Building a data lake that works is a continuous process rather than a one-off task. Even if a company has managed to avoid the various traps and pitfalls of turning a data lake into a data swamp and have built a working engine, there are at least two ways to load all the organization’s data into a data lake and to build all the needed data marts on top of it.The first option is to use a centralized approach with one team, but this approach can create a bottleneck, and for a medium-sized company with hundreds of its IT systems and business users, requires lots of resources.The second option is to empower product and service teams with all the needed tools so that they can load the raw data into a data lake and build all the required data marts on top of it in an independent self-service way. In this case, apart from the data lake as an architectural concept, what additional tools and processes are needed to allow end users to work efficiently with data in parallel, and still maintain the company’s standards of data governance?In this talk, Mikhail Setkin, a data lake platform owner at Raiffeisenbank (Russia), will give an overview of the company’s data lake architecture, putting stress on the key elements that make the implementation of an applied development platform where end users from product and service teams are enabled to develop their own data processing pipelines efficiently and independently with a reasonable time to market.
Mikhail Setkin is a data lake platform owner at Raiffeisenbank, a Russian subsidiary of Raiffeisen Bank International AG (Austria). He has more than 13 years of experience working in the data domain. In different roles at Raiffeisenbank, he has taken part in successfully implementing and developing the company’s data warehouse (DWH) and operational data store (ODS). Since 2016, he has led the development of a data lake platform, which includes the tools for collecting and storing historical data and analytical instruments (BI and ML).