Data Lake Engines Apache Spark Subsurface LIVE Sessions

Inside Adobe Experience Platform we noticed we needed to track actions happening at the control plane level and act upon them at lower levels like data lake, ingestion processes, etc. Using Apache Spark Stateful Streaming we’ve been able to create services that act by starting processes like compacting data, consolidating data, and cleaning data, minimizing processing time while keeping everything under defined SLAs. This talk presents a pattern that we’ve been using in production for the last two to three years inside Adobe Experience Platform in multiple services and with no high-severity on-call interventions and minimal-to-none operational costs on high throughput ingestion flows.