It’s been quite a journey from relational databases to warehouses and beyond. As data reigns supreme and analytics needs become more immediate and complex, warehouses simply aren’t enough. They’re limited to handling structured data at high speed, but all information isn’t organized that way. As an answer to big data and its challenges, data lakes offer cheap, flexible storage – all data types are welcome. They conserve computing resources by accepting raw data, keeping it as-is and deferring transformation until queried. Since the processing engine doesn’t work that hard, it becomes faster, making it ideal for machine learning queries and the holy grail of data analytics, business forecasting. But data lakes don’t replace warehouses entirely, and the gap remains. They aren’t meant for high-performance, structured queries and maintaining data quality long term is a constant concern. Organizations have tried deploying warehouses and data lakes in a complementary architecture, with each performing the task it’s meant to do. The system knows to route BI queries to warehouses and machine learning queries to data lakes. However, it introduces architectural complexity and doesn’t eliminate latency entirely. Besides, it’s expensive to combine two separate technologies. A tighter coupling of the two technologies would seem like the next progression in the scheme of things. This paves the way for data lakehouses, designed to simplify this architectural sprawl by providing one platform for BI and machine learning queries.
Read more at: https://cxotoday.com/story/semantic-lakehouses-whats-next-in-the-data-space/