Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Stream processing: An Introduction to Event Time in Apache Flink


by

Apache Flink supports multiple notions of time for stateful stream processing. This post focuses on event time support in Apache Flink. In the following sections we define what event time is with Apache Flink, we examine the different notions of time in a stream processing framework, and we describe how Flink works with Watermarks to measure progress in event time.

What is event time in stream processing

Event time in Apache Flink is, as the name suggests, the time when each individual event is generated at the producing source. In a standard scenario, collected events from different producers (i.e. interactions with mobile applications, financial trades, application and machine logs, sensor events etc.) store a time element in their metadata. This time signifies when a specific event is produced at the source. Stateful stream processing and real-time computations with Apache Flink use event time when the application logic requires data processing or computations based on the time that an event is generated.

How event time differs from processing & ingestion time

To effectively describe how Flink manages the different notions of time in stream processing, let’s imagine a scenario (illustrated below) where a mobile game player commutes to work with the underground. The player starts the mobile game at the platform, where the generated events are sent to the Flink operator. However, when the mobile player is onboard the carriage, the wifi connection is lost and data is stored on the mobile device without transmission to the stream processing system. The device is sending the remaining data to the streaming infrastructure once the user is back online, walking to the office.

Event-time-FFT-Icon-london-underground

In the above scenario, the mobile game provider could be using a real-time messaging application based on Apache Flink that analyzes user activity on the game and provides real-time offers through push notifications or messages. This application can process the incoming data based on different notions of time: Event time, that we described previously, processing time, and ingestion time that we describe in the following section.

Processing time refers to the time on the machine that executes a specific operation in a stream processing application. When a stream processing application uses processing time, it uses the machine’s clock to run any operations. A 5-hour processing time window will incorporate all events that arrived at the operator between the times that included the full 5-hour timeframe. Processing time is simple and requires no coordination between streams and machines, while it provides the best performance and lowest latency. However, in distributed systems and scenarios like the example described above, using processing time for computations might not always be the best fit as events will arrive at the operator asynchronously or out-of-order.

Event-time-FFT-Icon-Event-time

Ingestion time is the time that events reach the stream processing application. Ingestion time accounts for any processing delay and its potential fluctuation, and it marks the timestamp as soon as the processing system "consumes" the message. This timestamp is then associated with the specific event for any time-based operations. Ingestion time gives more predictable results when compared with processing time although still not 100% accurate as it cannot handle time-of-origin and out-of-order data.

Ingestion time provides more predictable results than processing time because it assigns a stable timestamp once at the source operator which makes all time-relevant operations refer back to the one assigned timestamp. On the contrary, with processing time, each operator may assign the record to a different window based on the local system clock.

Apache Flink handles ingestion time in a similar way to event time by assigning automatic timestamps at the source and generating automatic watermarks (see below).

The default time characteristic in Apache Flink is processing time. However, it might be necessary to update Flink’s settings to event time if an application works asynchronously or with distributed systems. Event time will provide the most accurate results across the other two options because, as we explained above, it assigns the timestamp at the moment the event or “message” is generated at the source. This timestamp then follows the event across all time-related operations. Event time is handled and supported by Watermarks in Apache Flink which we introduce below.

Processing time can be updated to event time in Apache Flink by following the command:


env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

Watermarks and Event time in Flink

Watermarks is Apache Flink’s mechanism of measuring progress in event time. Watermarks are part of the data stream and carry a timestamp t. A Watermark (t) declares that event time has reached time t in that stream, meaning that there should be no more elements from the stream with a timestamp t’ <= t (i.e. events with timestamps older or equal to the watermark). Watermarks are crucial for out-of-order streams, and asynchronous operations where the events are not ordered by their timestamps. In general, a watermark is a declaration that a specific point in the stream, all events up to a certain timestamp should have arrived. Once a watermark reaches an operator, the operator can advance its internal event time clock to the value of the watermark. To give an example, in a case where we have an hourly-window operation, watermarks will allow Flink to understand when the specific hourly window goes beyond the hour so that the operator can close the existing window in operation.

One of our earlier posts describes the feature in more detail and provides some helpful observations when working with Watermarks for stateful stream processing with Apache Flink.

The chosen time support for your stream processing application will depend on your application's requirements and specific system characteristics. Choosing the correct notion of time for your streaming application is crucial as this will ultimately impact your results and operations later.  Apache Flink has been designed to provide flexibility and support different notions of time depending on your application’s requirements. Read the Flink documentation for more information on Debugging Windows and Event time or signup to an Apache Flink Public Training by data Artisans for an introduction to event time and watermarks in Flink.

New call-to-action

New call-to-action

Ververica Academy

Topics:
Markos Sfikas
Article by:

Markos Sfikas

Comments

Our Latest Blogs

Driving Real-Time Data Solutions: Insights from Uber's Na Yang featured image
by Kaye Lincoln 23 April 2024

Driving Real-Time Data Solutions: Insights from Uber's Na Yang

As the organizers of Flink Forward, at Ververica we take great pride in bringing together the Apache Flink® and streaming data communities. Every year, we appoint a Program Chair responsible for...
Read More
Ververica celebrates as Apache Paimon Graduates to Top-Level Project featured image
by Kaye Lincoln and Karin Landers 18 April 2024

Ververica celebrates as Apache Paimon Graduates to Top-Level Project

Congratulations to the Apache Software Foundation and each individual contributor on the graduation of Apache Paimon from incubation to a Top-Level Project! Apache Paimon is a data lake format that...
Read More
Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 06 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More