Efficient Window Aggregation with Stream Slicing
Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. Current versions of Apache Flink use Window Buckets to process stream aggregations with session windows and out-of-order tuples. This Approach does not share partial aggregates among overlapping windows. In our talk, we present Scotty, a high throughput operator for window discretization and aggregation in Apache Flink. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all overlapping windows including session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Scotty was first published at ICDE 2018 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-scotty-icde-2018.pdf).
Jonas TraubTU Berlin
Jonas is a Research Associate at TU Berlin and a PhD candidate supervised by Volker Markl. His research interests include data stream processing, sensor data analysis, and data acquisition from sensor nodes. All his publications are available on http://www.user.tu-berlin.de/powibol/. He wrote my master thesis during a year abroad at the Royal Institute of Technology (KTH) and the Swedish Institute of Computer Science (SICS) / RISE in Stockholm under Supervision of Seif Haridi and Volker Markl and advised by Paris Carbone and Asterios Katsifodimos. He graduated with a M.Sc. in computer science in April 2015 at TU-Berlin. Prior to that, he received his B.Sc degree at Baden-Württemberg Cooperative State University (DHBW Stuttgart) and worked several years at IBM in Germany and the USA. He is a participant of Software Campus and Alumnus of Studienstiftung des deutschen Volkes and Deutschlandstipendium.
Philipp GrulichGerman Research Centre for Artificial Intelligence
Philipp is a computer science Master’s student at the Technische Universität Berlin, specializing in big data analytics systems. Besides the university, he has worked for several companies and collected experiences in frontend and backend software development. At the German Research Center for Artificial Intelligence, he joined a streaming systems oriented research project involving Apache Flink as a research assistant.