Use Case Track  

Adventures in Scaling from Zero to 5 Billion Data Points per Day

At Flink Forward San Francisco 2018 our team at Comcast presented the operationalized streaming ML framework which had just gone into production. This year in just a few short months we scaled a Customer Experience use case from an initial trickle of volume to processing over 5 Billion data points per day. This use case is used to help diagnose potential issues with High Speed Data service and provide recommendations to solving this issues as quickly and as cost-effectively as possible.

As with any solution that grows quickly, our platform faced challenges, bottlenecks, and technology limits; forcing us to quickly adapt and evolve our approach to enable handling 50,000+ data points per second.

 

We will introduce the problems, approaches, solutions, and lessons we learned along the way including: The Trigger and Diagnosis Problem, The REST problem, The “Feature Store” Problem, The “Customer State” Problem, The Savepoint Problem, The HA Problem, The Volume Problem, and of course The Really High Volume Feature Store Problem #2.

Authors

Dave Torok-1
Dave Torok
Comcast

Dave Torok

Dave is an Enterprise Software Architect with over twenty-five years of technical leadership in the telecommunications, financial services, and healthcare domains. Dave’s diverse experience ranges from engineering event-based and rule processing systems at “PaaS” (Platform as a Service) scale to building an autonomous-agent workplace simulation engine. At Comcast, Dave is a Distinguished Architect leading the end-to-end ingest, distributed compute, and machine learning pipeline architectures for Big Data applications supporting Customer Experience and Systems Engineering in both public and private cloud environments.