Flink Forward events are all about embracing the community and driving knowledge-sharing around Apache Flink® and the broader world of streaming data! These conferences offer a platform for the streaming data community to tell their stories, spark ideas, and foster collaboration. Each year, we appoint a Program Chair (this year it’s Ben Gamble, Field CTO at Ververica) to bring together a diverse and dynamic Program Committee. This group is composed of recognized experts from across the industry, each offering valuable perspectives on Apache Flink and real-time data streaming. They play a vital role in shaping the event by reviewing community-submitted talks.
I had the pleasure of sitting down with Yuan Mei, Director of Engineering at Alibaba and Apache Flink PMC member, to hear her thoughts on the future of streaming technologies and what excites her most about this year’s Flink Forward.
Certainly, my journey into the world of stream processing started during my PhD study. I was very lucky to get involved in some of the early research projects about stream processing, including Wavescope, Cartel, and ZStream. After graduating, I joined the streaming team at Facebook, now Meta, where I got to tackle real-world problems firsthand.
We quickly discovered two significant challenges in data streaming at that time:
Back then, early systems couldn't handle much beyond simple, immediate calculations and often relied on external databases to associate historical information. This not only made data streaming itself very difficult to use but also further exacerbated the data correctness issue.
Then I discovered Flink—it was a game-changer! Flink solved these core issues with its self-maintained internal state management and lightweight checkpointing mechanism dedicated to guarantee data correctness. It abstracts away much of the underlying complexity, making it easier for developers to focus on their applications. On top of that, its rich APIs make writing complex processing logic straightforward.
One of my most significant contributions is the introduction of ‘disaggregated state management’. I’ve led the shift from a tightly coupled compute-and-storage architecture to a more scalable and flexible disaggregated model - one of the core innovations in Flink 2.0 compared to earlier versions.
Disaggregated state management’s important key functions are decoupling compute and storage. This means the size of the state is no longer constrained by processing resources. More importantly, operations like checkpointing and scaling are now consistently fast, taking only seconds, even with massive state sizes. These were major pain points in Flink 1.x when dealing with big state sizes. Now, they are completely resolved since Flink 2.0.
That's really a great question! It's still a bit early to say how Flink's state management should be part of the AI generation because the integration of Flink and AI is currently in a phase of active exploration and development, including its combination with Retrieval-Augmented Generation (RAG), vector databases, Event-Driven Models, and Agent calls.
However, I believe one thing is certain: as Flink plays a larger role in accelerating AI, and as the integration between Flink's data processing and AI inference becomes tighter and processing logic becomes richer, it will undoubtedly create new demands on the handling of historical data and intermediate state – in terms of data types, scale, processing paradigms, and more. This mirrors exactly the early development of streaming processing itself.
While Flink 2.0's disaggregation addressed the checkpointing and rescaling challenges of large states, there are still significant unresolved issues. From my perspective, the two primary concerns are:
Therefore, I'm particularly interested in creative ideas from a user's perspective addressing three key topics:
As Flink continues to evolve, I believe these areas will be critical to how users unlock the full potential of stateful stream processing.
Sharing practical experiences and challenges, while exploring creative applications of Flink in novel domains and use cases, is key to inspiring innovation. I’m a firm believer that by exchanging ideas, we can uncover new ways to accelerate the entire data and AI pipeline - from real-time insights to intelligent automation.
While our integration with AI, user case development, upstream and downstream ecosystem, and technical depth are well-covered, we shouldn't overlook the relatively high barrier for understanding stream processing itself for newcomers. Given our significant influx of new users each year, it's crucial that we also provide resources around fundamental concepts, operational practice, and core principles. Those are things that easily get overlooked.
Feeling unsure about presenting is common. It happens to me as well. Time pressure and the desire for a flawless delivery are the main sources. I've come to realize that the real value for me to present lies in sharing interesting ideas, receiving constructive feedback, and building connections with people who bring different viewpoints. So, to anyone feeling uncertain – including myself – take the opportunity, do not think too much, and just do it
Do you have a great streaming data story to share? We want to hear it! The Call for Speakers is now open! Submit your talk before 11:59 (CEST), May 15th, 2025, to be considered as a speaker at Flink Forward Barcelona 2025!