"Ververica Platform becomes an integrated platform for the development and operations of Flink SQL
Today, we are happy to announce the general availability of Flink SQL on Ververica Platform. With this release, Ververica Platform becomes an integrated end-to-end solution for the deployment and operations of Flink SQL: from developing SQL scripts and managing user-defined functions (UDFs) all the way to autoscaling of the resulting long-running queries.
The Rise of Flink SQL
In the last few years, SQL has been making a comeback as a way to lower the entry barrier to distributed data processing. Originally released in 2016, the rise of Flink SQL is a manifestation of this process in the realm of stream processing. From the very beginning the Apache Flink community has placed a strong emphasis on keeping its SQL syntax in line with the SQL standard. Today, that’s the reason why anyone who has worked with relational database systems before can use Flink SQL to write a distributed stream processing application within minutes. Over time the community has adopted and developed this area of Apache Flink at an increasing rate, and by now Flink SQL accounts for about fifty percent of the development in the Apache Flink project (excluding documentation, development infrastructure, and the like).
Apache Flink’s SQL API has managed to unify batch and stream processing. This means that that you can run the same SQL query on a bounded table (a data set) as a batch job or an unbounded table (a stream of data) as a long-running stream processing application and — given the same input — the resulting table will be identical.
Despite covering both playing fields — batch and stream processing — Flink SQL provides a particularly rich set of features for real-time use cases including:
-
Temporal Table Joins, Interval Joins, Lookup Joins, OVER Windows
-
Mature connector ecosystem for reading and writing data from the most popular external systems
-
Rich support for user-defined functions (UDFs) in Java, Scala & Python
-
Processing of Change Data Capture (CDC) records, complemented by out-of-the-box support for Debezium & Canal formats
With its highly efficient and scalable runtime, Flink SQL is powering business-critical applications and business-wide platforms at companies like Alibaba, Yelp, Uber, and Huawei. Typical use cases range from real-time anomaly detection and analytics to materialized view maintenance and real-time data ingestion (ETL). In essence, Flink SQL allows a wider group of developers to write more streaming applications in less time, with fewer bugs and better performance.
Flink SQL on Ververica Platform
In recent months the team at Ververica has worked hard on creating an environment that makes Flink SQL shine even brighter. With the introduction of Flink SQL in Ververica Platform we have managed to make this versatile framework more accessible, its features more visible and its operations more manageable.
Ververica Platform SQL Clients
Ververica Platform 2.3 ships with a web-based editor purpose-built for Flink SQL. The latest release comes with a catalog explorer, auto-completion, continuous validation, schema auto-generation and query previews and gets you up and running with Flink SQL right “in your browser”.
The SQL Editor is complemented by a Jupyter extension that allows you to develop and deploy Flink SQL queries directly from within your Jupyter IPython notebooks.
The Jupyter extension is published under Apache License 2.0 and released separately from the core of Ververica Platform. It can be installed into any Jupyter notebook from PyPi and connects to a Ververica Platform instance over HTTP. Please see the release notes for details.
As always, all features and functionality are made available via our public REST API making it simple to integrate Flink SQL into your own clients and services.
Function, Connector & Catalog Management
In Flink SQL, all metadata like table schemas and properties are stored in a catalog. Now with Ververica Platform, you can manage these as well the corresponding dependencies centrally instead of locally on each client machine.
Out-of-the-box Ververica Platform comes with a catalog implementation that is persistent and generic, i.e. it can handle any Flink SQL database, table, or function definition. While vanilla Apache Flink only provides an in-memory catalog, Ververica Platform supports persisting tables and functions across sessions. Hence, tables and functions may be defined once centrally and reused across queries by different developers.
In addition, Ververica Platform supports external catalogs which provide access to data stored in external storage systems without the need to explicitly create tables. Not only does this reduce manual schema definitions, but it also keeps the table schema in Flink’s catalog in sync with the external system. For example, users can connect to Apache Hive’s metastore to continuously write data into an Apache Hive data lake with dynamic partition management.
Ververica Platform automatically scans uploaded JAR files for user-defined functions (UDFs) and updates the function metadata in its catalog accordingly. Ververica Platform only bundles the minimum required set of dependencies with each running SQL deployment — only the connectors and UDFs required by the query — thereby reducing the chance of class loading conflicts and increasing operational stability.
An End-to-End Platform for Flink SQL
So far, we have focused on developing SQL scripts and managing catalogs including everything that comes with it. But once started, most of these queries will become long running, stateful stream processing applications. That’s why it’s crucial to also cover the operational part of the story.
Fortunately, managing stateful stream processing applications has been a core functionality of Ververica Platform since its inception, and we have made sure that Flink SQL ties into the existing features around deployment, application lifecycle and monitoring.
-
Your SQL deployments will use the same highly available, fault tolerant runtime as any other application.
-
Ververica Platform will detect changes in your SQL deployment specification and automatically adjust your running deployment as configured by your upgrade and restore strategies.
-
You can suspend a query and later resume from where it left off.
-
And last but not least, Ververica Platform Autopilot — recently released in Ververica Platform 2.2.0 — can take care of cluster sizing and auto-scaling of your queries.
Community Edition and Stream Edition
Everything described so far is available in both Community Edition and Stream Edition, the enterprise edition of the platform. As a reminder: Community Edition is free-of-charge and free-for-commercial-use with an unlimited number and size of Apache Flink applications.
Still, there is one feature that we have reserved to our Stream Edition customers: STATEMENT SETS
CREATE TEMPORARY VIEW results AS
SELECT ...
FROM ...;
BEGIN STATEMENT SET;
INSERT INTO kafka SELECT * FROM results;
INSERT INTO s3 SELECT * FROM results;
END;
Statements Sets allow you to group multiple INSERT INTO statements. These statements are then holistically optimized and executed as a single Apache Flink application, something that oftentimes significantly improves the efficiency of executing multiple queries.
For a detailed description of the differences between Community Edition and Stream Edition please check the editions table on our website.
Ververica Platform 2.3 beyond Flink SQL
Besides Flink SQL, this release, of course, comes with a couple of additional improvements, features, and fixes throughout the different areas of the platform all of which are listed in the release notes. In this blog post, I would like to only highlight one of them: the launch of maven.ververica.com.
As part of Ververica Platform, we provide our customers with our own distribution of Apache Flink. Thereby, we are able to deliver critical bug and vulnerability fixes within days and support our customers over a longer period of time compared to what is possible in the Open Source community.
So far, we have distributed our fork primarily via Docker images. These Docker images only contain the binary distribution of Apache Flink and, hence, do not cover all of the components that we support commercially: connectors, for example, are not part of the binary distribution.
In order to support our customers more efficiently we will, from now, publish all dependencies for each release of our distribution of Apache Flink in a public Maven repository hosted under maven.ververica.com. Please check our documentation for more information on how to integrate our repository into your build system.
What’s Next?
Of course, this release is only the beginning of our efforts around Flink SQL on Ververica Platform. We have a packed backlog of features that we are excited to ship including support for more external catalogs (JDBC, Confluent Schema Registry), various improvements to our clients, support for (non-temporary) views, full-blown session cluster support, and much more. Stay tuned!
Interested?
If you would like to try out the platform yourself, our getting started guide and the corresponding playground repository are the best place to start. Make sure to follow the Ververica blog for more use cases and a detailed walkthrough of how to use Flink SQL in Ververica Platform soon. We are very much looking forward to your feedback and suggestions.