Meet the Flink Forward Program Committee: A Q&A with Yuan Mei

Flink Forward

12 May 2025 by Kaye Lincoln

Flink Forward events are all about embracing the community and driving knowledge-sharing around Apache Flink® and the broader world of streaming data! These conferences offer a platform for the streaming data community to tell their stories, spark ideas, and foster collaboration. Each year, we appoint a Program Chair (this year it’s Ben Gamble, Field CTO at Ververica) to bring together a diverse and dynamic Program Committee. This group is composed of recognized experts from across the industry, each offering valuable perspectives on Apache Flink and real-time data streaming. They play a vital role in shaping the event by reviewing community-submitted talks.

I had the pleasure of sitting down with Yuan Mei, Director of Engineering at Alibaba and Apache Flink PMC member, to hear her thoughts on the future of streaming technologies and what excites her most about this year’s Flink Forward.

Can you share a bit about your journey in data streaming and how you became involved with Apache Flink?

Certainly, my journey into the world of stream processing started during my PhD study. I was very lucky to get involved in some of the early research projects about stream processing, including Wavescope, Cartel, and ZStream. After graduating, I joined the streaming team at Facebook, now Meta, where I got to tackle real-world problems firsthand.

We quickly discovered two significant challenges in data streaming at that time:

How to connect current data with past information for more complex analysis
How to ensure data correctness.

Back then, early systems couldn't handle much beyond simple, immediate calculations and often relied on external databases to associate historical information. This not only made data streaming itself very difficult to use but also further exacerbated the data correctness issue.

Then I discovered Flink—it was a game-changer! Flink solved these core issues with its self-maintained internal state management and lightweight checkpointing mechanism dedicated to guarantee data correctness. It abstracts away much of the underlying complexity, making it easier for developers to focus on their applications. On top of that, its rich APIs make writing complex processing logic straightforward.

What are some of the biggest innovations or challenges you’ve tackled in Flink storage and engine development?

One of my most significant contributions is the introduction of ‘disaggregated state management’. I’ve led the shift from a tightly coupled compute-and-storage architecture to a more scalable and flexible disaggregated model - one of the core innovations in Flink 2.0 compared to earlier versions.

Disaggregated state management’s important key functions are decoupling compute and storage. This means the size of the state is no longer constrained by processing resources. More importantly, operations like checkpointing and scaling are now consistently fast, taking only seconds, even with massive state sizes. These were major pain points in Flink 1.x when dealing with big state sizes. Now, they are completely resolved since Flink 2.0.

The theme this year is "The Future of AI is Real-Time." From your perspective, how does Flink’s stateful stream processing power AI-driven applications?

That's really a great question! It's still a bit early to say how Flink's state management should be part of the AI generation because the integration of Flink and AI is currently in a phase of active exploration and development, including its combination with Retrieval-Augmented Generation (RAG), vector databases, Event-Driven Models, and Agent calls.

However, I believe one thing is certain: as Flink plays a larger role in accelerating AI, and as the integration between Flink's data processing and AI inference becomes tighter and processing logic becomes richer, it will undoubtedly create new demands on the handling of historical data and intermediate state – in terms of data types, scale, processing paradigms, and more. This mirrors exactly the early development of streaming processing itself.

What trends or emerging challenges in streaming data and stateful processing would you love to see addressed in speaker submissions?

While Flink 2.0's disaggregation addressed the checkpointing and rescaling challenges of large states, there are still significant unresolved issues. From my perspective, the two primary concerns are:

The current state store operates as a black box. Despite holding substantial data, its utility and how to effectively leverage it remain unclear.
Managing large states continues to be expensive, hindering broader adoption of Flink for accelerating data pipelines.

Therefore, I'm particularly interested in creative ideas from a user's perspective addressing three key topics:

How to make use of the state data.
How to further reduce the state cost and consequently guarantee data correctness.
How the state empowers AI-driven applications.

As Flink continues to evolve, I believe these areas will be critical to how users unlock the full potential of stateful stream processing.

From your perspective, what makes a talk truly valuable for the Flink Forward audience?

Sharing practical experiences and challenges, while exploring creative applications of Flink in novel domains and use cases, is key to inspiring innovation. I’m a firm believer that by exchanging ideas, we can uncover new ways to accelerate the entire data and AI pipeline - from real-time insights to intelligent automation.

Are there any areas of Flink, state management, or real-time architectures that you think are often underrepresented in conference talks?

While our integration with AI, user case development, upstream and downstream ecosystem, and technical depth are well-covered, we shouldn't overlook the relatively high barrier for understanding stream processing itself for newcomers. Given our significant influx of new users each year, it's crucial that we also provide resources around fundamental concepts, operational practice, and core principles. Those are things that easily get overlooked.

What advice would you give to those unsure about presenting?

Feeling unsure about presenting is common. It happens to me as well. Time pressure and the desire for a flawless delivery are the main sources. I've come to realize that the real value for me to present lies in sharing interesting ideas, receiving constructive feedback, and building connections with people who bring different viewpoints. So, to anyone feeling uncertain – including myself – take the opportunity, do not think too much, and just do it

Fluss

Introducing The Era of "Zero-State" Streaming Joins

Introducing the next evolution in streaming joins: Apache Fluss offers ze...

November 19, 2025

by Giannis Polyzos

Unified Streaming Data Platform

Ververica Platform 3.0: The Turning Point for Unified Streaming Data

End the batch vs streaming divide. Flink-powered lakehouse with 5-10× fas...

October 15, 2025

by Vladimir Jandreski

Fluss

Introducing Apache Fluss™ on Ververica’s Unified Streaming Data Platform

Discover how Apache Fluss™ transforms Ververica's Unified Streaming Data ...

October 15, 2025

by Ben Gamble

Apache Flink Unified Streaming Data Platform VERA VERA-X

VERA-X: Introducing the First Native Vectorized Apache Flink® Engine

Discover VERA-X, the groundbreaking native vectorized engine for Apache F...

October 15, 2025

by Ben Gamble

Meet the Flink Forward Program Committee: A Q&A with Yuan Mei

Can you share a bit about your journey in data streaming and how you became involved with Apache Flink?

What are some of the biggest innovations or challenges you’ve tackled in Flink storage and engine development?

The theme this year is "The Future of AI is Real-Time." From your perspective, how does Flink’s stateful stream processing power AI-driven applications?

What trends or emerging challenges in streaming data and stateful processing would you love to see addressed in speaker submissions?

From your perspective, what makes a talk truly valuable for the Flink Forward audience?

Are there any areas of Flink, state management, or real-time architectures that you think are often underrepresented in conference talks?

What advice would you give to those unsure about presenting?

Subscribe to Our Newsletter

Compliance

Product

Solutions

Why Ververica

Resources

Meet the Flink Forward Program Committee: A Q&A with Yuan Mei

Can you share a bit about your journey in data streaming and how you became involved with Apache Flink?

What are some of the biggest innovations or challenges you’ve tackled in Flink storage and engine development?

The theme this year is "The Future of AI is Real-Time." From your perspective, how does Flink’s stateful stream processing power AI-driven applications?

What trends or emerging challenges in streaming data and stateful processing would you love to see addressed in speaker submissions?

From your perspective, what makes a talk truly valuable for the Flink Forward audience?

Are there any areas of Flink, state management, or real-time architectures that you think are often underrepresented in conference talks?

What advice would you give to those unsure about presenting?

You may also like

Introducing The Era of "Zero-State" Streaming Joins

Ververica Platform 3.0: The Turning Point for Unified Streaming Data

Introducing Apache Fluss™ on Ververica’s Unified Streaming Data Platform

VERA-X: Introducing the First Native Vectorized Apache Flink® Engine