March 30, 2022

Tracking & Triggering Pattern with Spark Stateful Streaming

Inside Adobe Experience Platform we noticed we needed to track actions happening at the control plane level and act upon them at lower levels like data lake, ingestion processes, etc. Using Apache Spark Stateful Streaming we’ve been able to create services that act by starting processes like compacting data, consolidating data, and cleaning data, minimizing processing time while keeping everything under defined SLAs. This talk presents a pattern that we’ve been using in production for the last two to three years inside Adobe Experience Platform in multiple services and with no high-severity on-call interventions and minimal-to-none operational costs on high throughput ingestion flows.

Topics Covered

Data Lake Engines

Dremio Subsurface for Apache Spark

Speakers

Andrei Ionescu

Andrei Ionescu is a Senior Software Engineer with Adobe, and he is part of Adobe Experience Platform’s Data Lake team, specializing in big data and distributed systems with Scala, Java, Spark, and Kafka. At Adobe, he is mainly contributing to ingestion and data Lake projects, while on open source he is contributing to Hyperspace and Apache Iceberg.

Tracking & Triggering Pattern with Spark Stateful Streaming

Speakers

Unlock the Full Potential of Stateful Streaming: Power Your AI Initiatives with Trusted Data

Get Started Free

See Dremio in Action

Talk to an Expert

Ready to Get Started?