Session preview: Building Stateful Streaming Pipelines at Godaddy with Flink

April 02, 2020 | by Ankit Jhalaria

Are you thinking of joining the Virtual Flink Forward on April 22 - 24? It’s the first time the Godaddy team will present at a Flink Forward event and we are beyond excited to share our experience with Apache Flink and how we used the open source framework to transform our pipelines from batch processing to real time stream processing and make data available to our downstream consumers. Read through for a sneak preview of my session Building Stateful Streaming Pipelines that you can join remotely on April 23!

If you haven’t done so already, go ahead and register for the event to learn about the new developments around Apache Flink. Here’s what my talk will be about in a few weeks:


Building Stateful Streaming Pipelines (1)

 

Building Stateful Streaming Pipelines 

Building a streaming platform from the ground up is a very interesting problem to solve at scale. We, at GoDaddy, leveraged Apache Beam as the programming model for writing both batch and streaming pipelines to run them on Flink on AWS. In an effort to also support running batch jobs which primarily run on our own data centers in Spark, we deploy the same beam code on Spark. Internally at GoDaddy, microservices can store data in SQLServer and make that data available to teams via DB replication. Some microservices make data available via RESTful services as well. The data platform team at GoDaddy provides a unified view of our business to our teams and regardless of where and how data is stored and/or returned, we run our streaming pipelines in Flink by combining data from multiple ingress sources including SQLServer CDC (Change Data Capture) logs.

 

Topics covered

Some of the core topics my session will cover are the following: 

  • Building a data platform from scratch: Where should you start from? What do you need to take into consideration before starting developing your platform and how that impacts the development process later on? 

  • Learnings from our journey to deploying our e-commerce production pipelines running on Flink: What do you need to be aware of, how to successfully deploy your pipelines in production, what challenges we faced and how did we overcome them? 

  • Future-proofing your pipeline architecture so that you can run anywhere (in the cloud, on-premise). How do you make sure that your architecture can scale at different levels and in different environments? How do you ensure a cloud-native infrastructure that can also be deployed on-premises? 

 

Key takeaways

  • Things to keep in mind when running things at scale with Apache Flink

  • Streaming + Batch architecture review for someone starting the journey of consuming data from multiple sources

  • Common errors when multiple massive pipelines are deployed on Flink. Things that can potentially be tuned and parameters to consider.

 

Registration to the Virtual Flink Forward is free and you can join from anywhere in the world through the Flink Forward website. I look forward to virtually meeting the Apache Flink community and learn about all the exciting developments around the technology!

 

About the author: 

Ankit_JhalariaAnkit Jhalaria is a Principal Software Engineer at GoDaddy where he is responsible for building and maintaining the company’s streaming data platform using Apache Beam and Apache Flink to make data available to downstream customers. He previously worked at AtScale building a BI platform and Yahoo where he worked on large-scale applications with MapReduce and Hadoop. He holds a Masters in Computer Science from USC. He is an Apache Beam contributor and he enjoys spending time with his family.

Flink Forward, Registration, Virtual Event, Apache Flink, stream processing

New call-to-action

 

 

 

 

 

Topics: Flink Forward

Ankit Jhalaria
Article by:

Ankit Jhalaria

Related articles

Comments

Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Chen Qin September 21, 2021

The Apache Flink Story at Pinterest - Flink Forward Global 2021

On October 27, at the annual Apache Flink user conference, Flink Forward Global 2021, Pinterest Tech Lead, Chen Qin will deliver a keynote talk on “Sharing what we love: The Apache Flink Story at...

Read More
by Holger Temme August 16, 2021

Ververica named a 'Strong Performer' in Streaming Analytics by Forrester

We are excited to see Ververica Platform, developed by the original creators of Apache Flink, debut on the Forrester Wave™ 2021: Streaming Analytics report as a Strong Performer! Back in 2019,...

Read More
by Victor Xu July 13, 2021

Troubleshooting Apache Flink with Byteman

Introduction

What would you do if you need to see more details of some Apache Flink application logic at runtime, but there's no logging in that code path? An option is modifying the Flink source...

Read More