Introducing Apache Fluss™ on Ververica’s Unified Streaming Data Platform

Written by Ben Gamble | 15 October 2025

The future of data lies in breaking down the barriers between batch and streaming. Ververica’s mission is to pioneer a new way of working with data by making insights available immediately, making architectures simpler, and giving organizations the ability to act on their data faster. Today, we are proud to announce the next major step toward that vision: Apache Fluss™ (incubating) is now part of Ververica’s Unified Streaming Data Platform.

This represents a turning point for how enterprises can think about real-time data, one beyond Kafka-centric architectures that were not designed and built for analytical use cases. With Apache Fluss, Ververica becomes a complete unified platform to seamlessly deliver ingestion, compute, and storage in one cohesive experience, without the fragmentation that comes with other solutions. No longer are data streams treated as raw event logs waiting to be transported, copied, and transformed across a patchwork of external systems. Instead, they are treated as living, queryable tables, continuously updated, instantly accessible, and ready to power the most demanding business applications, while reducing the need for stitching together multiple systems and products.

Ready to try Fluss on Ververica's Unified Streaming Data Platform?

Rethinking Streaming Architectures

For years, organizations have relied on messaging-centric architectures as the backbone of their real-time systems. These technologies excel at transporting large volumes of events, but they were never designed to serve as queryable stores, which is what data pipelines require. As a result, companies are forced to layer additional systems, caches, databases, and warehouses alongside their event log. Every new system results in additional pipelines to build and maintain, data duplication to manage, and latency gaps to tolerate.

This approach creates several challenges, including:

Complexity: maintaining multiple layers for streaming, batching, caching, and analytics results in fragile pipelines with a high operational overhead.
Inefficiency: storing the same data across systems drives up infrastructure costs while also creating inconsistencies between “hot” and “cold” views of the same dataset.
Latency: separating real-time updates from historical storage inevitably means that decision-makers work with partial, fragmented, or outdated information.
Schema management: most data streaming tools like Kafka operate as bytes, meaning that schemas have to be built and managed separately across multiple systems becomes an additional burden, making architectures fragile and costly to operate.

Image 1: Streaming-First Architectures Status Quo

Apache Fluss reimagines this paradigm. Instead of treating a stream as an immutable sequence of events, it treats streams as tables; a columnar, queryable storage layer designed for streaming analytics. Every incoming change is immediately reflected in both the log and an up-to-date table (cache) representation. Embracing this duality eliminates the artificial distinction between streams and tables. It enables enterprises to query data in real time, join it efficiently with other flows, and persist it seamlessly into longer-term lakehouse storage, all without shuffling data through external systems or writing complex glue code.

Apache Fluss on Ververica’s Unified Streaming Data Platform

With Apache Fluss now integrated within Ververica, organizations can manage the full lifecycle of real-time data with a single unified solution. Data changes are captured in real time, processed and enriched in motion, and then persisted in Fluss, where they are immediately queryable. This creates a unified architecture that removes the boundaries between ingestion, computation, and storage.

Image 2: Ververica Platform with Apache Fluss

In this architecture, Apache Fluss is not a peripheral component but the core storage engine for streaming workloads. It makes it possible to:

Unify log and cache semantics in one system, eliminating the complexity and overhead of managing them separately.
Enable “zero-state” streaming analytics, reducing or even removing the need for massive, unstable stateful jobs by shifting heavy lifting into Apache Fluss.
Reduced network data transfer with efficient access patterns, ensuring only the required columns, partitions, or records are transmitted, with no wasted network or storage costs.
Serve hot data and tier seamlessly into the lakehouse, with shared metadata, consistent partitioning and bucketing, and the ability to combine real-time and historical queries in a single, unified view.
Power multi-modal and AI workloads, with native support for modern formats such as Lance, making streaming pipelines ready for ML features, embeddings, and vector search.

This unified design allows enterprises to build applications where the freshest data is always available for queries, joins, and analytics, without detours through external databases or caches. It makes Ververica the only truly unified platform where organizations can connect, process, store, analyze, and govern data in real-time, all at sub-second latency.

How Apache Fluss Completes Ververica’s Streamhouse Vision

As we’ve continued to develop Ververica’s Unified Streaming Data Platform and the Streamhouse concept, one area that’s posed challenges is the real-time layer. While existing architectures can ingest and compute data with low latency, they often lack a storage layer that can simultaneously support analytics, queries, and statefulness without compromise.

Streamhouse is Ververica’s answer to bridging lakehouse and streaming, a concept meant to enable “real-time lakehouse” behavior. But until now, the real-time tier has gaps: it can’t fully deliver sub-second queryability, low-latency joins at scale, nor the seamless transition between streaming and historical workloads.

In other words, Fluss makes the Streamhouse vision complete, a single platform where ingestion, compute and storage converge, and where real-time and historical data are no longer separate worlds, but rather one unified solution.

Learn more about bringing the Streamhouse to real-time in the 2024 blog: From the Kappa Architecture to Streamhouse: Making the Lakehouse Real-Time

Ververica: An AI-Ready Solution

Apache Fluss also sets the foundation for AI-native data architectures. By treating streams as queryable tables and supporting modern formats like Lance and LanceDB, Fluss makes it possible to handle the ingestion of multimodal data, text, images, audio, video, and embeddings in real time. This enables use cases such as similarity search, retrieval-augmented generation (RAG), and multimodal analytics directly on live data, while seamlessly tiering older data into Lance-powered lakehouse storage for training and retraining.

Fluss can also serve as a real-time feature store, eliminating the need for separate online and offline layers. Features are always fresh, always queryable, and always consistent across inference and training. This unified approach simplifies ML pipelines, reduces duplication, and ensures models learn and adapt continuously from the latest data.

With Apache Fluss, Ververica customers get one unified solution where AI, multimodal workloads, and real-time analytics converge, all with sub-second latency.

From Complex Pipelines To Real-Time Business Value

The impact of Apache Fluss within Ververica’s solution can be realized in every dimension of modern data architectures. For engineering teams, it means simpler systems that collapse what used to be a complex chain of tools into one cohesive platform. Pipelines that once required careful orchestration of multiple technologies can now be built and operated directly within Ververica, accelerating time-to-market and reducing risk.

For business leaders, it means faster and more reliable insights. Because data in Apache Fluss is both log and table, every change is queryable the moment it occurs. Real-time dashboards update continuously, and applications can respond to events with sub-second freshness. Decisions are no longer delayed by hours or days of batch processing, nor distorted by inconsistencies between systems.

For organizations at scale, Apache Fluss drives efficiency and cost savings. By reducing network transfer, offloading heavy state from compute jobs, and consolidating hot and cold data, workloads that once consumed thousands of cores can now run with a fraction of the resources. This translates into lower infrastructure costs and stronger ROI.

And for enterprises preparing for the AI-driven future, Apache Fluss provides an AI-ready foundation. With support for multi-modal data and AI, it erases the gap between operational and analytical data, enabling intelligent systems that learn, adapt, and act in real time.

A New Day for Streaming In The Age Of AI

With Apache Fluss, Ververica is once again setting the standard for the industry. Just as we redefined stream processing with Flink, we are now redefining streaming storage. The combination of the two in one enterprise-ready platform creates an end-to-end real-time data platform that is unified, efficient, and ready for the age of AI.

This sets the blueprint for the next generation of real-time architectures. It is a way to finally collapse the silos between streams and the lakehouse, operational and analytical, and to replace them with a single, living, flowing system. This is different from other solutions, which stitch together disparate systems and solutions and are inherently fragmented and difficult to manage as a result.

Fluss was named for the German word for “river”. Ververica’s Unified Streaming Data Platform ensures your organization’s river of continuous data flows efficiently, smoothing out the path ahead for a new era of data-driven business and instant insights and results.

Try Fluss on Ververica by registering for the private preview today.

More Resources

Check out these helpful resources to learn more about Streamhouse, Fluss, and Flink:

View full post