Flink Forward San Francisco 2017 Recap: On the State of Stream Processing with Apache Flink

Last month, the data Artisans team traveled to San Francisco, California to gather with the Apache Flink® and stream processing communities and to host the first-ever Flink Forward conference on the West Coast. We'd like to extend a sincere thanks to all sponsors, speakers, and attendees for helping to make the event a great one.

Missed this year's San Francisco conference? No worries. Recordings of all talks and the corresponding slides are available online for your viewing pleasure. 

Our team learned a lot from the Apache Flink users we met and spoke with, and in this post, we'll review just a handful of the trends we observed at the conference:

  • Flink's Growing Footprint
  • Analytics, But Also Applications
  • Machine Learning with Flink
  • Flink and the Broader Ecosystem

Flink's Growing Footprint

Both Uber and Netflix were represented by speakers at Flink Forward Berlin 2016, and the companies returned to the stage in San Francisco to deliver opening keynotes and discuss how they're building internal stream processing platforms powered by Flink. 

Chinmay Soman, the tech lead on the streaming platform at Uber, discussed the evolution of stream processing within the company up to the current Flink-powered AthenaX platform.
AthenaX - Stream Processing at Uber with Apache Flink

And Monal Daxini, an engineering manager on the streaming team at Netflix, shared a demo of the company's Flink-powered Keystone stream processing platform, where internal users can configure a stream processing job via a web UI. 

Keystone - Stream Processing at Netflix with Apache Flink

Alibaba, an active contributor in the Flink open source community and also a speaker in 2016, delivered two talks that detailed their work on Flink's table and SQL API as well as runtime improvements for large-scale streaming

Alibaba and Apache Flink - Table and SQL API

It was exciting to see the trend of platformization of Flink-based stream processing systems, enabling Flink to spread within a company and to be used across a range of different teams.

Analytics, But Also Applications

We heard descriptions of Flink use cases from a number of different companies, and there were many examples of Flink being used not only for analytics pipelines, but also to power 24/7 applications capable of responding to data in real-time. 

Scott Kidder of Mux, an analytics company for streaming video providers, and Erik de Nooij of ING, the Global 500 banking company, both discussed real-time anomaly detection systems built with Flink (see Mux's talk here and ING's talk here). 

control stream

David Brelloch, Sean Hester, and David Hardwick of BetterCloud, a multi-SaaS management platform, shared a system that allows their business users to dynamically configure Flink processing rules on an Apache Kafka source stream and allows the Flink state to be built dynamically using replay of targeted messages from a long-term storage system.

Dynamic rules with Apache Flink at BetterCloud

And Cliff Resnick and Seth Wiesman of MediaMath, a programmatic marketing company, shared how they rebuilt their real-time reporting infrastructure using Flink. 

file system

Machine Learning with Flink

There's also plenty of innovation happening in machine learning with Flink, from online training to model serving. 

Dean Wampler of Lightbend discussed pragmatics of training deep learning models in a streaming context combined with low-latency application of those models.

Lightbend and Machine Learning with Apache Flink

Eron Wright of EMC introduced a new open-source project called Flink TensorFlow, enabling Flink programs to operate on data using TensorFlow machine learning models.

Apache Flink TensorFlow - Deep Learning with Apache Flink

Swaminathan Sundararaman of Parallel Machines discussed using Flink for real time classification and clustering of time series data. The company has experimented with both streaming in Flink and micro-batches in Spark for their use case, and in the talk, they shared benefits of streaming that they've observed.

Online Machine Learning - Streaming vs Batch - Parallel Machines

Flink and the Broader Ecosystem

As Flink adoption increases, the ecosystem surrounding Flink is developing further, too, and a number of speakers shared tooling that's complimentary to Flink. 

Elizabeth Joseph and Ravi Yadav of Mesosphere demo'd Flink's integration with DC/OS (and Apache Mesos), which was included in Flink's 1.2.0 release in February 2017. 

Apache Flink on Mesos and DC/OS - Mesosphere

Kenneth Knowles of Google (and a member of the Apache Beam PMC) walked through Beam's model for state and timers--including functionality that's currently only supported in the open-source by the Beam-on-Flink runner

Apache Flink on Beam - Timers - Flink Runner

And last but not least, Dell/EMC VP Engineering Srikanth Satya and Sr. Consulting Engineer Tom Kaitchuck announced Pravega, a new storage system designed for modern stream processing. 

Storage for Stream Processing - Pravega and Apache Flink from Dell/EMC

Up Next

Flink Forward returns to Berlin's Kulturbrauerei on Sept 11-13, 2017, and you can register now. Stay tuned for our call for papers. 

Once more, we'd like to thank our sponsors Dell/EMC, Alibaba Group, Google Cloud Platform, and Lightbend for helping us put on a successful event. We'll see all of you again in California next year!

Tags: Use Cases, Flink Forward