Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Session preview: Adobe's realtime identity graph with Flink


by

Are you ready for the Virtual Flink Forward this week? As a first time speaker, I am very excited to meet the Apache Flink community and share our experience using Flink at Adobe! This blog post gives a preview of my session “Real time identity graph” scheduled for April 22, 2020

If you haven’t done so already, go ahead and register for free to hear more about our use case with Apache Flink at Adobe and meet the Flink community. Here’s a preview of the session at Virtual Flink Forward this week:

Realtime identity graph

Adobe's identity graph links multiple identities for a user across multiple devices into a single unified identity. As a use case, this graph can be used to transfer a "build your own vehicle" session at a car website from desktop to mobile providing "personalization without a login"

Motivation

As an example, Alice has been planning to buy a car for some time and one afternoon during lunch break she goes to Ford’s website and spends 20-30 minutes customizing her car.

Adobe identify graph, Apache Flink, stream processing, big data, personalization example

Now she is all excited and goes to dinner with her husband in a restaurant and she wants to show her new car to her husband and discuss the buying decision She goes to the Ford site again on her phone this time. If it loads up like shown in red below [See figure] she is going to have to spend another 20 minutes getting back her favorite configuration and her partner may quickly lose interest. Instead, if it comes back by default in her favorite configuration as shown in green below [see figure] this is:

  1. A much more pleasant experience for users like Alice and her partner.

  2. She may decide to buy a Ford car because of this experience over a competitor.

Adobe’s identity graph enables this for users even if they don’t login in while maintaining their complete privacy and safety online.

Adobe, Identity Graph, Flink, stream processing, Apache Flink, personalization

This talk describes the design of Adobe’s  Real time identity graph on Flink which builds and updates this identity graph in near real time with 25B+ events a day, 2B+ devices and serving 500M+ users. It also covers how we arrived at Flink itself after evaluating multiple systems for our use case.

Solution

Our solution at a high level looks as follows:

Adobe, Apache Flink use case, Flink use case

Our scale is 25B+ events per day and this is one hour taken from our precision/recall studies with the old system we replaced.

Flink Application, Flink use case, Adobe

Evaluation of Frameworks

We objectively evaluated multiple streaming frameworks for our system and the summary of our results are:

Stream processing frameworks, evaluation, Flink, Apache Flink, Adobe, open source

Tune in to our talk to hear all the exciting details on this “beauty contest” for streaming frameworks!

In particular, our talk will cover:

  1. Adobe Experience Cloud, non-Oscar winning, non-photoshop part of Adobe which is a billion-dollar division in its own right

  2. Identity graph and its motivation.

  3. Objectively evaluating a software framework in particular in the context of streaming frameworks.

  4. Architecture of realtime identity graph on top of Flink, a complex system with 500M+ users and 25B+ events/day.

  5. Lessons learned in building this system in the context of Flink.

Summary

We turned a fuzzy decision of comparing multiple streaming frameworks into an objective decision-making methodology. For a sneak peek of our evaluation details take a look at the AdobeTech Blog. We used that decision to build a large scale event streaming system based on Flink successfully at a scale of 25B+ events per day. Tune in to our talk to listen more about this evaluation and our Flink-based system, realtime identity graph

New call-to-action

New call-to-action

About the author: 

Fakrudeen Ali AhmedFakrudeen is an Architect in Digital Experience Cloud at Adobe with focus and expertise in Big data and ML technologies. Formerly, he was Senior Manager at Yahoo, managing Yahoo Front page, Yahoo Sports and Yahoo Finance content ranking and personalization system.

Fakrudeen is a member of Association for Computing Machinery (ACM) and American Association for the Advancement of Science (AAAS).

New call-to-action

Topics:
Fakrudeen Ali Ahmed
Article by:

Fakrudeen Ali Ahmed

Comments

Our Latest Blogs

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 09 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More
Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community featured image
by Ververica 03 April 2024

Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community

Ververica has officially donated Flink Change Data Capture (CDC) to the Apache Software Foundation. In this blog, we’ll explore the significance of this milestone, and how it positions Flink CDC as a...
Read More
Announcing the Release of Apache Flink 1.19 featured image
by Lincoln Lee 18 March 2024

Announcing the Release of Apache Flink 1.19

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people...
Read More