Stream Processing & Apache Flink - News and Best Practices

Singles Day 2018: data in a Flink of an eye

Written by Marta Paes | 22 November 2018

Sunday 11 November 2018 marked Alibaba’s 10th annual shopping festival Singles Day, which saw a record Gross Merchandise Value (GMV) of $30.8 billion in sales and set a new record for the Chinese e-commerce platform. This is a 27 percent year-over-year growth compared with 2017’s total of $25.3 billion. Alibaba Singles day sales hit the $1 billion mark in just 1 minute and 25 seconds after the start of the shopping show, with delivery order surpassing 1 billion for the day.

With millions of shoppers accessing the platform and the need to simultaneously support multiple peripheral scenarios — from the mega-media displays to real-time business intelligence for merchants —Alibaba is literally flooded with record-breaking levels of data traffic. So, how does its underlying infrastructure stay afloat at this scale so efficiently, leveraging second-level latency and high computation accuracy? Since 2017, Alibaba has used Apache Flink as the core of its data pipeline architecture, bumping its stream processing capability by five times, on average(1). Apache Flink powers different applications for the company’s Singles Day shopping event, including the real-time dashboards with aggregation results behind the media displays, as well as its search and recommendation engine, optimized according to the shopper’s activity. With Alibaba’s stream processing platform, data is computed within milliseconds of its generation to create a superior user experience and maximize the value of business insight that can be derived and broadcasted down multiple channels.

Alibaba’s data pipeline architecture as of 2017 can be seen in the diagram below. For more details on how Alibaba’s Department of Data Technology and Products handles trillions of event logs per day at massive scale with stream processing, you can refer tothis AliTech’s blog post.

Alibaba has developed Blink, its own internal flavor of Apache Flink, as the company’s core stream framework that processes continuously produced incremental data to support real-time, data-driven applications and the data needs of multiple stakeholders. Xiaowei Jiang, who is leading the StreamCompute Platform for AliCloud, delivered a talk at Flink Forward Berlin 2018 focusing on how Alibaba uses this approach to build a unified engine for streaming, batch and AI workloads, all at the same time. You can find out more about how Alibaba has become one of the largest adopters and contributors to Apache Flink, widely using the framework to implement stream processing in large-scale production, here.

Alibaba’s success on Singles Day 2018 is just one of the numerous examples of enterprises that move to real-time, invest in stream processing and Apache Flink as their data processing framework of choice and move their applications to the next era. Apache Flink has experienced an enormous increase in adoption over the past years and with new developments on the platform anticipated soon, we are excited about what the future holds ahead for the Flink community!

We encourage you to subscribe to the Apache Flink mailing list below to get the latest updates about upcoming releases and contribute your ideas to Flink. You can always contact us for more information or if you have any questions.

  1. Source: Fight Peak Data Traffic On 11.11: The Secrets of Alibaba Stream Computing