Ecosystem Track  

Deploying ONNX models on Flink

The Open Neural Network exchange format (ONNX) is a popular format to export models to from a variety of frameworks. It can handle the more popular frameworks like PyTorch and MXNet but also lesser known frameworks like Chainer and PaddlePaddle. To this point there have been few attempts to integrate deep learning models into the Flink ecosystem and those that have focused entirely on Tensorflow models. However, the amount of deep learning models written in PyTorch continues to grow and many companies prefer to use the other frameworks. This talk will focus on different strategies to use ONNX models in Flink applications for realtime inference. Specifically, it will compare using an external microservice with AsyncIO, Java Embedded Python, and Lantern (a new backend for deep learning in Scala). The talk will weigh these different approaches and which setups works faster in practice and which are easier to setup. It will also feature a demonstration where we will take a recent PyTorch natural language processing model, convert it to ONNX and integrate it into a Flink application. Finally, it will also look at a set of open-source tools aimed at making it easy to take models to production and monitor performance.

Authors

Isaac Mckillen-Godfried
Isaac Mckillen-Godfried
AI Stream

Isaac Mckillen-Godfried

I'm a data scientist at AIStream and frequently work with a variety of big data technologies including Flink. My primary area of focus broadly speaking is making it easier to use deep learning models in real world applications. Part of this includes integrating current deep learning frameworks (in Python) with big data frameworks (in Java) to make it easier to serve models at scale with low-latency.