Inside Adobe Experience Platform we noticed we needed to track actions happening at the control plane level and act upon them at lower levels like data lake, ingestion processes, etc. Using Apache Spark Stateful Streaming we've been able to create services that act by starting processes like compacting data, consolidating data, and cleaning data, minimizing processing time while keeping everything under defined SLAs. This talk presents a pattern that we've been using in production for the last two to three years inside Adobe Experience Platform in multiple services and with no high-severity on-call interventions and minimal-to-none operational costs on high throughput ingestion flows.
Andrei Ionescu is a Senior Software Engineer with Adobe and part of Adobe Experience Platform’s Data Lake team, specializing in big data and distributed systems with Scala, Java, Spark, and Kafka.
At Adobe, he mainly contributies to ingestion and data lake projects, while on open source he contributes to Hyperspace and Apache Iceberg.