Stream Loops on Flink: Reinventing the wheel for the streaming era
You have probably heard that stream processing subsumes batch workloads, a valid but not yet fully implemented claim. Our lab research aims to fulfil this dream and delve further into the deep world of iterative processes, a fundamental building block for graph and machine learning algorithms. Yet, a building block that is missing from your stream pipelines today. In this talk, we will investigate why bulk and stale synchronous iterative models are nothing more than a special case of out-of-order stream processing, the paradigm behind your ultra fast watermark-based window aggregations on Flink and Beam. Next, we will examine how watermarks can be extended to incorporate more metrics for tracking iterative progress as well as the necessary structured graph modifications (spoiler alert: loops) that can make our lives easier. Finally, we will demonstrate how on top of these primitives we can execute scalable multi-pass window aggregations with purgeable and persistent managed state as well as robust flow control and several domain specific applications such as Vertex-centric graph aggregations and Stochastic Gradient Descent on stream windows.
Paris CarboneKTH Royal Institute of Technology in Stockholm
Paris Carbone is a Flink committer and a senior computer scientist within the special intersection of distributed systems, data management and programming systems. Paris is currently the tech lead of the ‘Continuous Deep Analytics’ project at KTH and RISE SICS in Sweden, investigating how intermediate programming languages and hardware acceleration will make data streaming the dominant end-to-end architecture for critical and complex decision making. At night, you can catch Paris performing with his jazz quintet at the oldest neighbourhoods of Stockholm.