Use Case Track
Threading Needles in a Haystack: Sessionizing the Uber firehose in realtime
One Uber's Marketplace team we're tasked with efficiently matching our riders and driver partners in real time. To that end, we we employ various systems within the ride sharing marketplace such as dynamic pricing (popularly known as surge), demand modeling and forecasting systems and health monitoring systems in order to ensure optimal marketplace efficiency. One of the key data systems underpinning Uber's Marketplace is the Rider Sessions pipeline which tracks the lifetime of a singular Uber trip in realtime, capturing rider interaction with various Uber systems, from the pricing engine to dispatch systems all the way until a trip end. In our talk we discuss the evolution of the Rider Session state machine at Uber and challenges involved in managing realtime stateful streaming pipeline across half a dozen event streams and millions of riders who use Uber every day. We plan to delve into various aspects of the running the job in production such as managing state checkpointing, monitoring, experiences moving the job from Spark Streaming to Flink and ensuring low latency to downstream systems.
Amey is a Senior Software Engineer on Uber’s Marketplace Data Intelligence team where he works on the stateful streaming and geo-spatial data systems that power various applications ranging from health monitoring, forecasting to dynamic pricing within Uber’s rider sharing marketplace. He’s been dealing with thorny issues around streaming pipelines and state ever since he started his career working on Yahoo’s ad tech systems in 2011 when Apache Pig and Storm were the state-of-the-art. He holds a B.S/M.S in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign.