Technology deep-dive Track
How to build a modern stream processor: The science behind Apache Flink
Stream Processing has evolved quickly in a short time: a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have complex logic, strict correctness guarantees, high performance, low latency, and maintain large state without databases. Since then, Stream processing has become much more sophisticated because the stream processors – the systems that run the application code, coordinate the distributed execution, route the data streams, and ensure correctness in the face of failures and crashes – have become much more technologically advanced. In this talk, we walk through some of the techniques and innovations behind Apache Flink, one of the most powerful open source stream processors. In particular, we plan to discuss: The evolution of fault tolerance in stream processing, Flink’s approach of distributed asynchronous snapshots, and how that approach looks today after multiple years of collaborative work with users running large scale stream processing deployments. How Flink supports applications with terabytes of state and offers efficient snapshots, fast recovery, rescaling, and high throughput. How to build end-to-end consistency (exactly-once semantics) and transactional integration with other systems. How batch and streaming can both run on the same execution model with best-in-class performance.
Stefan is an Apache Flink