Intel distributed Model Inference platform presented at Flink Forward

Written by Dongjie Shi & Jiaming Song | 21 September 2020

Flink Forward Global Virtual Conference 2020 is kicking off next month and the Flink community is getting ready to discuss the future of stream processing, and Apache Flink. This time, the conference is free to attend (Oct. 21-22, 2020), while the organizer has put together six hands-on, instructor-led training sessions, Oct. 19 & 20 that cover multiple aspects of Apache Flink, such as:

Flink Development (2 days)
SQL Development (2 days)
Runtime & Operations (1 day)
Stateful Functions (1 day)
Tuning & Troubleshooting (introduction and advanced, 1 day each)

We feel very lucky to be presenting how the team at Intel utilizes Apache Flink in our Cluster Serving service inside Intel’s Analytics Zoo. Read through for a sneak preview of our Flink Forward session, Cluster Serving: Distributed Model Inference using Apache Flink in Analytics Zoo, October 21, 2020. If you haven’t done so already, make sure to register and secure your spot before October 1 and get ready to hear some interesting talks around the latest technology developments and Flink use cases from companies across different industries, company sizes and locations!

Cluster Serving: Distributed Model Inference using Apache Flink in Analytics Zoo

As deep learning projects evolve from experimentation to production, there is increased demand to deploy deep learning models for large scale, real time distributed inference. While there are many available tools for different tasks, such as model optimization, model serving, cluster scheduling, cluster management and more, deep learning engineers and scientists are still faced with the challenging process of deploying and managing distributed inference workflows that can scale out to large clusters in an intuitive and transparent way.

What we are going to cover

Our session is going to introduce the two major areas for the successful delivery of model serving: Big Data coupled with Machine Learning. Once a model is trained, serving the model intuitively becomes an important task in building Machine Learning pipelines. In a model serving scenario, the two major benchmarks we should be looking at are latency and throughput.

To address the demand for extreme low latency model serving in Machine Learning pipelines, we developed Cluster Serving: an automatic and distributed serving solution in Intel’s Analytics Zoo. Cluster Serving takes advantage of Flink's streaming runtime, with its low latency, continuous processing engine. In a similar way, and in order to address the demand for high throughput, Cluster Serving implements batch processing in Flink’s unified data processing engine. Besides, Cluster serving provides support for a wide range of deep learning models, such as TensorFlow, PyTorch, Caffe, BigDL, and OpenVINO. Our model serving solution supports a simple publish-subscribe (pub/sub) API, that allows users to easily send their inference requests to the input queue using a simple Python or http API.

In our session we are going to introduce Cluster Serving and its architecture design to the Flink Forward audience and discuss the underlying design patterns and tradeoffs to deploy and manage deep learning models to distributed, Big Data and unified data processing engines in production. We will additionally showcase some real-world use cases, experiences and examples from our users that have adopted Cluster Serving to develop and deploy their distributed inference workflows.

What you will learn

In our session you will get a deep understanding of Cluster Serving’s implementation design around Apache Flink and its core features. Additionally we are going to cover some integrations, such as our Redis data pipeline as well as different API designs before we open the floor for a discussion and opinion sharing with the audience.

By attending our session, you will get a hands-on introduction to Cluster Serving in Intel’s Analytics Zoo and how it works to address distributed, real time model service at large scale. You will also see real world use cases of how companies utilize our platform and you will see first-hand how Apache Flink is utilized under the hood to support distributed model inference and serving. Some key learnings from our session will be how to:

Parallelize expensive operations
Use data pipelines like message queues for parallel sources
Minimize data transfers in a Machine Learning pipeline with Apache Flink

Registration to the Flink Forward Global Virtual Conference 2020 is free and you can join from anywhere in the world! We look forward to virtually meeting the Apache Flink community at the end of October!

View full post