Use Case Track  

Creating millions of user sessions using Complex Event Processing

Every day, Yelp connects millions of consumers with great local businesses through the website and mobile apps. We strive to provide our users with an ever-evolving, excellent experience by constantly running a plethora of experiments based on user activity.

 

A user session encapsulates all of a single user’s activity until the user has been dormant for 30 minutes. Creating user sessions requires us to process hundreds of millions of log events occurring daily and applying filters on them. Due to the large volume of log events, creation of these sessions presents us with several application level challenges, including: handling of late events, filtering bot traffic, etc. Features like event time and exactly once processing that are provided by Flink made building such a large scale streaming application like ours possible.

 

Our main motivation to move towards streaming from batch processing stemmed from the fact that our feedback on analysis based on user sessions was always a day late and as an added bonus it also meant integrating with our state-of-the-art data-pipeline ecosystem.

 

In this talk we will not only discuss why Yelp moved from creating user sessions using batch jobs to generating them in near-real-time using Apache Flink but also highlight issues we encountered with continuous bot traffic that never closed the session window, adding custom triggers for long running sessions, duplicate events while allowing late events to be processed, auditing of the created sessions etc.

Authors

PremSantosh
Prem Santosh Udaya Shankar
Yelp

Prem Santosh Udaya Shankar

Prem Santosh is a Software Engineer at Yelp who enjoys working on problems related to scaling applications and the theoretical side of distributed consensus. He spends his spare time cooking, skiing and playing league of legends. He loves to talk about raft, paxos, what seasonings go best with chicken and how to never get out of the bronze league.