Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Flink Forward San Francisco Session Preview: Scaling a real-time streaming warehouse with Apache Flink, Parquet and Kubernetes


by

Authors: Ramesh Shanmugam & Aditi Verma

Flink Forward San Francisco is a couple of days away! In case you haven’t booked your tickets yet, here’s a sneak preview of our session Scaling a real-time streaming warehouse with Apache Flink, Parquet and Kubernetes, on April 2, 2019, to give you some more insight into what you can expect at the conference next week.

If you haven’t registered already, make sure to book your last minute tickets while they last! Spots are limited so hurry up to secure your place at Flink Forward and learn more about the exciting world of Apache Flink!

Scaling a real-time streaming warehouse
with Apache Flink, Parquet and Kubernetes

Aditi, Ramesh

Background

Branch is the industry-leading mobile measurement and deep linking platform. For this, we process more than 20 billion events and store several terabytes of data per day.

In this talk, we cover our learnings and challenges running and scaling an Apache Flink Parquet warehouse on Kubernetes. We share our challenges around memory management and failure recovery. We also talk in detail about our current Apache Flink infrastructure, recovery and auto-scaling mechanisms.

Topics covered

This talk covers a detailed overview of our challenges around writing columnar file formats with Flink. We also talk about the decisions taken and learnings around migrating Flink jobs from Mesos on Kubernetes. Then we talk about auto scaling Flink jobs on Kubernetes, as well as efficiently handling failure scenarios

Key takeaways

  • Learnings from running Apache Flink clusters on Mesos and Kubernetes

  • Takeaways from writing Parquet files with ApacheFlink

Make sure to secure your spot by registering on the Flink Forward website today. The event includes multiple tracks and it’s a unique opportunity to bring your knowledge and stream processing expertise to the next level! Sessions cover among other Flink use cases, technology deep dives, Apache Flink and stream processing ecosystem talks and deep dives so don’t miss out on the exciting conference schedule!


banner-ff-sf-blog-1

About the authors:

Aditi Verma

Aditi VermaAditi is a senior software engineer at Branch, working on developing and scaling their data platform, that processes tens of billion events per day. Prior to Branch, she worked at Yahoo to develop data systems that provide actionable insights and audience targeting from petabytes of data. She has a wide range of experience in the data domain, from stream and batch processing to resource management, scaling and monitoring.

Ramesh Shanmugam

RameshRamesh Shanmugam is a Senior Data Platform Engineer at Branch Metrics. At Branch, currently, he is building streaming and batch pipelines at a huge scale using Apache Flink, Spark, and Airflow. He has been creating distributed applications for more than 15 years. Passionate about building data-intensive applications.

New call-to-action

Topics:
Article by:

Ramesh Shanmugam

Comments

Our Latest Blogs

Driving Real-Time Data Solutions: Insights from Uber's Na Yang featured image
by Kaye Lincoln 23 April 2024

Driving Real-Time Data Solutions: Insights from Uber's Na Yang

As the organizers of Flink Forward, at Ververica we take great pride in bringing together the Apache Flink® and streaming data communities. Every year, we appoint a Program Chair responsible for...
Read More
Ververica celebrates as Apache Paimon Graduates to Top-Level Project featured image
by Kaye Lincoln and Karin Landers 18 April 2024

Ververica celebrates as Apache Paimon Graduates to Top-Level Project

Congratulations to the Apache Software Foundation and each individual contributor on the graduation of Apache Paimon from incubation to a Top-Level Project! Apache Paimon is a data lake format that...
Read More
Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 06 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More