Eliminating the Ugly Plumbing of Data Lake Engineering

Thursday, July 22 2021

Dive into four areas of data lake engineering and hear about the technical details of how Upsolver eliminated its ugly plumbing.

For decades Oracle dominated the database landscape. It was an expensive monolith, but it did make things easy and familiar for the database user, since it provided a standard SQL interface and handled burdensome technical functions under the covers.

Data lakes upended the monolithic database, separating ingestion, storage and processing into independently scalable components. While this has provided tremendous flexibility and affordable infrastructure at scale, it has required scarce and expensive big data engineering talent to glue products together into solutions, via hand-coding and hundreds of configurations using distributed systems like Hadoop and Spark.

The challenge at hand is this: How do you make the data lake as easy to use as the traditional Oracle database, so that citizen data practitioners and not just big data engineers can take advantage of the wealth of data it holds.

In this talk, Ori Rafael, a long-time Oracle veteran while working in Israeli intelligence, and now CEO of Upsolver, will dive into each of the following areas of data lake engineering complexity and discuss the novel approaches Upsolver took to eliminate the ugly plumbing and democratize the data lake:

• Automated file systems management – addressing the small files problem, serialization, file formats and compression
• Joins, updates and deletes – enacting standard database operations on an immutable object store
• Orchestration – determining the best path to a desired table without burdening the user with data pipelines overhead
• Consistency – providing strongly consistent datasets on top of an eventually consistent object store