A deep dive on Change Data Capture with Flink SQL during Flink Forward

Flink Forward Flink SQL

10 September 2020 by Jark Wu & Qingsheng Ren

Can you believe that Flink Forward Global Virtual Conference 2020 is only a few weeks away?

We are very excited to be presenting some of the latest — and most significant — Flink developments to the wider community, including the internals of performing Change Data Capture (CDC) and real time data processing with Flink SQL. We are looking forward to welcoming you to our session, taking place October 22. Make sure to register and secure your spot today!

This second virtual Flink Forward is packed with exciting deep dive talks, technical sessions and Apache Flink use cases showcasing not only how far Flink has come as a unified data processing framework, but where the technology is heading in the coming releases. We invite you to (virtually) join the Flink community and immerse yourself in the exciting world of stream processing, real time data analytics and event driven applications.

If you haven’t done so already, go ahead and check the full conference program to discover some exciting sessions from companies like Intel, Spotify, Alibaba, Uber and more.

Change Data Capture and Processing with Flink SQL

Change Data Capture (CDC) has become the standard method for capturing and propagating committed changes from a database to downstream consumers, such as keeping multiple datastores in sync and avoiding common pitfalls due to, for example, dual writes. CDC is one of the foundational data blocks when building a data warehouse, because many business-related information and data are stored in databases. Consuming such changelogs with Apache Flink used to be rather adventurous, but with the introduction of support for CDC in the latest Flink 1.11 release, developers can now implement Change Data Capture from the comfort of their SQL couch 😀😀.

What we are going to cover

In our session Change Data Capture (CDC) and real time data processing with Flink SQL, we will introduce the new table source interface (FLIP-95) and discuss how it works and how it makes CDC possible. We will illustrate the advantages of using Flink SQL for CDC and the use cases that are now unlocked, such as data transfer, automatically updating caches and full-text index in sync, and finally materializing real-time aggregate views on databases. We will show how to use Flink SQL to easily process database changelog data generated with Debezium. Furthermore, we will introduce a more lightweight architecture to capture changelogs with flink-cdc-connectors and eliminate the dependencies of Debezium and Kafka service.

With a live demo, we will show how to use Flink SQL to capture change data from upstream MySQL and PostgreSQL databases, join the change data together and stream out to ElasticSearch for indexing. The entire demo will be solely based on pure SQL without a single line of Java/Scala code.

Lastly we will close the session with an outlook of upcoming features around Flink SQL and Change Data Capture (CDC) as well as more ecosystem connectors around this.

What you will learn

Through our session, you will get a clear understanding of the latest developments around Change Data Capture (CDC) with Apache Flink, and specifically FlinkSQL. With our demo, participants will experience first-hand how easy it is to capture data changes from a database with FlinkSQL.

Finally, you’ll learn some best practises around CDC and how to use Flink SQL as a powerful method to Extract, Transform and Load data (ETL) at the same time.

Make sure to secure your spot before October 1 by registering on the Flink Forward website. As an addition to the conference, this Flink Forward event features six instructor-led training sessions covering both beginner and advanced Flink topics such as: