Use Case Track

Detecting Patterns in Event Streams with Flink SQL

For a long time, complex event processing (CEP) and stream analytics have been treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest, enrich, and aggregate high-volume streams. Both types of use cases have very different requirements which resulted in diverging system designs. CEP systems excel at low-latency processing whereas engines for stream analytics achieve high throughput usually due to distributed scale-out architectures.
Apache Flink was one of the first open source stream processors that was able to address the full spectrum of stream processing applications, ranging from applications with low latency requirements to applications that process millions of events per second. On top of this powerful processing engine, the Flink community built APIs for complex event processing and streaming analytics, namely the CEP library and support for streaming SQL. Since recently, the Flink community is integrating both APIs by extending Flink SQL to support the MATCH RECOGNIZE clause for row pattern matching that was introduced with the SQL:2016 standard. 
In my talk, I will discuss the new MATCH RECOGNIZE feature and present use cases that benefit from pattern matching support in streaming SQL, such as process monitoring or anomaly detection. I will demonstrate the feature with a few example queries and take a brief look under hood to discuss a few details of the implementation.


Dawid Wysakowicz
Dawid Wysakowicz
Dawid Wysakowicz

Dawid Wysakowicz is a Flink committer, currently working as a Software Engineer at data Artisans. Recently his main area of interest is detecting patterns in streams of data with Flink Complex Event Processing library. Previously worked at GetInData, where he’s been implementing real-time streaming solutions based on Apache Flink. His journey with highly distributed and scalable solutions started in 2015 while writing Master Thesis on Distributed Genomic Datawarehouse.