Predicting TV Tune-In Using PySpark, MLlib & Delta Lakehouse
At MIQ Digital India Pvt. Ltd. we collect and process high-volume TV viewing data and apply machine learning models to help TV networks get the maximum value out of their ad slots.
We use Apache Spark MLlib to model and PySpark for data wrangling and feature engineering with a Kafka-based event-driven microservices architecture. It uses a well-defined data engineering ecosystem of a lakehouse architecture built on top of Delta Engine.
This talk will cover scaling MiQ’s TV product to market across >50 advertisers, details of pipeline optimization for data at TB scale, and cost optimizations for model generations and prediction.
Speakers
Rohit Srivastava
Rohit Srivastava is a Senior Engineering Lead with expertise working in the programmatic media buying space for the Ad-Tech domain. He is skilled in application development and building highly scalable big data analytics platforms. He is a data pipeline builder responsible for optimizing data stores and building them from the ground up. Rohit is experienced in performing root cause analysis and optimizing data pipelines at scale with a cost-effective approach.
Bitanshu Das
Bitanshu is the lead data engineer at MiQ managing the Data Engineering team responsible for building Scalable Big data solutions in Programmatic Media Buying in Ad Tech space. He loves exploring the data and building Optimised Data pipelines from the scratch.