Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Session preview: Building Stateful Streaming Pipelines at Godaddy with Flink


by

Are you thinking of joining the Virtual Flink Forward on April 22 - 24? It’s the first time the Godaddy team will present at a Flink Forward event and we are beyond excited to share our experience with Apache Flink and how we used the open source framework to transform our pipelines from batch processing to real time stream processing and make data available to our downstream consumers. Read through for a sneak preview of my session Building Stateful Streaming Pipelines that you can join remotely on April 23!

If you haven’t done so already, go ahead and register for the event to learn about the new developments around Apache Flink. Here’s what my talk will be about in a few weeks:

Building Stateful Streaming Pipelines

Building Stateful Streaming Pipelines

Building a streaming platform from the ground up is a very interesting problem to solve at scale. We, at GoDaddy, leveraged Apache Beam as the programming model for writing both batch and streaming pipelines to run them on Flink on AWS. In an effort to also support running batch jobs which primarily run on our own data centers in Spark, we deploy the same beam code on Spark. Internally at GoDaddy, microservices can store data in SQLServer and make that data available to teams via DB replication. Some microservices make data available via RESTful services as well. The data platform team at GoDaddy provides a unified view of our business to our teams and regardless of where and how data is stored and/or returned, we run our streaming pipelines in Flink by combining data from multiple ingress sources including SQLServer CDC (Change Data Capture) logs.

Topics covered

Some of the core topics my session will cover are the following:

  • Building a data platform from scratch: Where should you start from? What do you need to take into consideration before starting developing your platform and how that impacts the development process later on?

  • Learnings from our journey to deploying our e-commerce production pipelines running on Flink: What do you need to be aware of, how to successfully deploy your pipelines in production, what challenges we faced and how did we overcome them?

  • Future-proofing your pipeline architecture so that you can run anywhere (in the cloud, on-premise). How do you make sure that your architecture can scale at different levels and in different environments? How do you ensure a cloud-native infrastructure that can also be deployed on-premises?

Key takeaways

  • Things to keep in mind when running things at scale with Apache Flink

  • Streaming + Batch architecture review for someone starting the journey of consuming data from multiple sources

  • Common errors when multiple massive pipelines are deployed on Flink. Things that can potentially be tuned and parameters to consider.

Registration to the Virtual Flink Forward is free and you can join from anywhere in the world through the Flink Forward website. I look forward to virtually meeting the Apache Flink community and learn about all the exciting developments around the technology!

About the author:

Ankit_JhalariaAnkit Jhalaria is a Principal Software Engineer at GoDaddy where he is responsible for building and maintaining the company’s streaming data platform using Apache Beam and Apache Flink to make data available to downstream customers. He previously worked at AtScale building a BI platform and Yahoo where he worked on large-scale applications with MapReduce and Hadoop. He holds a Masters in Computer Science from USC. He is an Apache Beam contributor and he enjoys spending time with his family.

New call-to-action

New call-to-action

New call-to-action

Topics:
Ankit Jhalaria
Article by:

Ankit Jhalaria

Comments

Our Latest Blogs

Announcing the Release of Apache Flink 1.19 featured image
by Lincoln Lee 18 March 2024

Announcing the Release of Apache Flink 1.19

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people...
Read More
Building real-time data views with Streamhouse featured image
by Alexey Novakov 10 January 2024

Building real-time data views with Streamhouse

Use Case Imagine an e-commerce company operating globally, we want to see, in near real-time, the amount of revenue generated per country while the order management system is processing ongoing...
Read More
Streamhouse: Data Processing Patterns featured image
by Giannis Polyzos 05 January 2024

Streamhouse: Data Processing Patterns

Introduction In October, at Flink Forward 2023, Streamhouse was officially introduced by Jing Ge, Head of Engineering at Ververica. In his keynote, Jing highlighted the need for Streamhouse,...
Read More