Dell Technologies and Ververica: Analyzing Continuous Data Streams Across Industries

April 23, 2020 | by Flavio Junqueira

This post originally appeared on the Dell EMC blog. It was reproduced on the Ververica blog with permission from its author.


Every single entity in the digital world, be it an end user or a sensor device, continuously produces activity updates. People leave traces of their activity when they shop online or use any other online application. Machines then report on actions and statuses. Capturing such traces and reports in the form of data streams enables software applications that  digitally reconstruct the history of these entities to allow queries against the data and actionable insights. The same is true for applications that represents these streams in different formats.

To satisfy the demanding requirements of such applications, we developed the Dell EMC Streaming Data Platform (SDP). The solution ingests, stores and processes data continuously, mapping the software abstractions more naturally to those types of applications. The centerpiece of our SDP platform is the open source storage system called Pravega.

While ingesting and storing are critical functions of any data pipeline, a streaming data solution needs a stream processor that can benefit from the features that Pravega offers. Pravega exposes streams as storage primitive, enabling applications to ingest and consume data in a stream form. Pravega streams accommodate an unbounded amount of data while being both elastic and consistent.

Apache Flink, a unique framework in the space of data analytics, offers powerful functionality for processing both unbounded and bounded data sets. When used in conjunction with Pravega, Apache Flink can tail, or historically process, data using the same source, while providing end-to-end, exactly once semantics, and dynamically adapt to resource demands. The combination of Pravega and Apache Flink in the Streaming Data Platform raises the bar for stream processing platforms. It brings an unprecedented level of features that have long been needed to fulfill the requirements of existing and future applications.

 

SDP-Ververica-Dell-Flink-Pravega

Figure 1: Data pipelines with Pravega and Flink in SDP

 

The future

Dell Technologies envisions a world in which data streams are ubiquitous and stream processing is the new norm for modern application development. Stream applications are already prevalent, but there’s a clear increase in demand and scale for systems that process stream data. We are excited about this future and the opportunity to build the systems that can help our customers’ applications process stream data efficiently and effectively.

The combination of Pravega and Flink in the Streaming Data Platform to compose data pipelines already provides unique possibilities. The partnership between Dell Technologies and Ververica will further enable us to build features for data pipelines that will help organizations simplify storage needs, create a foundation of unified data and innovate using an endless array of applications. We look forward to working together to deliver streaming data technology that continues to provide meaningful outcomes for our customers.

To learn more about Pravega and Flink, explore the two sessions from the Flink Forward Virtual Conference 2020 by visiting the Flink Forward YouTube Channel. There, you can see the keynote, “Stream analytics made real with Pravega and Apache Flink” with Srikanth Satya, VP of Engineering at Dell Technologies, and “Everything is connected: How watermarking, scaling, and exactly once impact one another in Pravega,” presented by myself, Flavio Junqueira, Senior Distinguished Engineer at Dell Technologies.

Flink Forward, Registration, Virtual Event, Apache Flink, stream processing

Ververica Contact, Apache Flink Contact, Stream Processing

 

 

 

 

About the author: 

Flavio_Paiva_JunqueiraFlavio Junqueira leads the Pravega team at DellEMC. He holds a PhD in computer science from the University of California, San Diego and is interested in various aspects of distributed systems, including distributed algorithms, concurrency, and scalability. Previously, Flavio held a software engineer position with Confluent and research positions with Yahoo! Research and Microsoft Research. Flavio has contributed to a few important open-source projects. Most of his current contributions are to the Pravega open-source project, and previously he contributed and started Apache projects such as Apache ZooKeeper and Apache BookKeeper. Flavio co-authored the O’Reilly “ZooKeeper: Distributed process coordination” book.

 

Topics: Flink Forward

Article by:

Flavio Junqueira

Related articles

Comments

Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Alexander Fedulov July 09, 2020

Presenting our Streaming Concepts & Introduction to Flink Video Series

Transitioning from the batch data processing world into the world of stream processing and real time analytics can be challenging. Throughout this process, there are many new concepts you need to...

Read More
by Konstantin Knauf July 06, 2020

Announcing Early Access Program for Flink SQL in Ververica Platform

Been wondering what's next for Ververica Platform? Maybe you've already guessed: Flink SQL is coming to Ververica Platform later this year! Today we are excited to announce our Early Access Program

Read More

Data-driven Matchmaking at Azar with Apache Flink

The Hyperconnect team attended Flink Forward for the first time a couple of months back and presented how we utilize Apache Flink to perform real time matchmaking for the video-based social...

Read More