SQL now stands for Streaming Query Language

Senior Sales Engineer

July 1, 202613 min read

For four decades, SQL was the language for data at rest. Tables sat still. Queries swept across them. Results came back. Driver connection closed. Done. Great to analyze what happened yesterday. Not so good to tell what is happening right now.

That era is over. Digital businesses need fresh data to trigger decisions and improve customer experience now. No more day -1 refresh; milliseconds matter in a fast-paced economy.

The most consequential shift in data engineering over the past few years is not a new framework, not a new format, not a new vendor. It is a redefinition of three letters everyone thought they understood. SQL is no longer just Structured Query Language. SQL is the Streaming Query Language: the most friendly way to express continuous computation over unbounded data.

DataStream APIs in Java remain for use cases that require managing streaming building blocks such as time, state, and a DLQ. But they are no longer the only option available. For other use cases, a language-based approach would be faster to write, easier to use, and more elegant. The exact intentions around SQL were built for that. The question is how much you can build on it. Ververica was built to answer this question.

Declarative Is The Unlock

SQL on streams enforces the same contract it has always used on tables: a declarative one. You declare what the result is. The engine executes the how: incrementally, with continuous correctness, as new events arrive. No delay, no buffer.

A column projection: SELECT customer_score FROM survey
A windowed aggregation: TUMBLE(TABLE stream, DESCRIPTOR(time), INTERVAL '10' SECOND)
A stream-to-table enrichment: JOIN ... FOR SYSTEM_TIME AS OF ...
Deduplication, top-N per key, and pattern matching: CEP with MATCH_RECOGNIZE helps build complex data logic.

The intent of the pipeline becomes its source code. Watermarks, state lifecycles, operator chaining, checkpoint coordination: the runtime owns all of it. Developers can operate and build on data, semantics, and business jargon.

Agility, By Default

SQL pipelines are short, but no less powerful and scalable. A non-trivial enrichment-and-aggregation job can be coded in a few lines. Short codes are fast to write, review, understand, and change.

Developers use a declarative language to deliver on daily requests:

A new dimension to track means adding a column to the SELECT statement.
One-minute granularity instead of hourly means changing the window definition.
Late events arriving 15 minutes out instead of 5 means adjusting the watermark strategy with WATERMARK ... INTERVAL '15' MINUTE in the table DDL.

The change is straightforward, human-readable. The time from prototyping to deployed pipelines collapses from sprints to hours. Declarative helps companies innovate easily and remain flexible in capturing business opportunities.

A new project follows a blueprint: connect to data sources, implement the business logic, and sink the output. This is a repeatable model that helps companies to standardize the way of working.

This is what "low-code" means in a serious engineering context. Not a drag-and-drop beautiful UI. SQL provides a high-level abstraction that reduces complexity and leaves you in full control of the semantics, without sacrificing the power of the core engine's capabilities.

The Optimizer Carries The Weight

Flink SQL is not a thin parser on top of a weaker execution path. It runs through a real optimizer built on Apache Calcite, with rule and cost-based planning that applies pushdowns, pruning, and subquery rewriting, among other steps to improve performance. The optimizer sees the full query and plans across CPU, memory, network, and I/O.

That is why Flink SQL is not second-class next to Java-based DataStream jobs. SQL and Table API queries are optimized holistically, translated into a single program, and executed on the same Flink runtime. The difference is in the authoring model, not in execution power.

The conclusion is clean. Flink SQL sits on the same execution substrate as DataStream jobs, where it matters: a single optimized program, regular DataStream execution after translation, and an optimizer that sees the whole relational plan rather than only the operator a developer handwrote. SQL removes glue code while the engine keeps the force.

Flink SQL compiles to the same features that run the most demanding streaming workloads in the world. Every property the platform is known for is intact:

Exactly-once semantics through distributed snapshotting and aligned checkpoints.
Stateful processing at scale with scalable-backend state, incremental checkpoints, and configurable retention.
Resilience through automatic recovery from the latest checkpoint after task or node failure.
Horizontal scalability and parallelism through the same distributed processing architecture.
Event-time processing in Flink SQL using DDL watermarks, late-event filtering, and source-idleness timeouts

Catalogs and SQL, The Winning Combination

To improve data democratization, users and developers need to be self-sufficient when finding business information. Catalogs are the go-to place for building streaming applications, providing a navigation map to find organization-wide data assets, such as tables, views, and functions.

A topic registered as a table is discoverable, queryable, and governable. The same transaction table supports multiple use cases: real-time fraud scoring, nightly reconciliation, ad hoc analysis, and enriched push notifications. One schema means one source of truth.

Catalogs for streaming enable cross-data governance:

Reusability becomes structural. Datasets defined once are consumed everywhere. The "rebuild the same join three times in three pipelines" anti-pattern dies.
Governance stops being a secondary priority. Access controls, masking policies, and retention rules attach to catalog objects, not to individual jobs. Business logic lives in the source code; governance lives in the Catalog.
Lineage becomes transversal. Every input and every output of every SQL job is a catalog-registered object. Lineage is a property of the system, not a documentation task afterward.

SQL on streams becomes architectural and foundational, while the catalog provides the entry point

Practical Flink SQL Patterns

Flink SQL directly covers common streaming patterns: connector DDL, metadata columns, watermarks, lookup enrichment, window aggregation, continuous analytics, and model inference.

Connector configuration in SQL

Connector configuration in SQLsql

CREATE TABLE Orders (
    order_id STRING,
    customer_id BIGINT,
    total DECIMAL(12, 2),
    proc_time AS PROCTIME(),
    event_time TIMESTAMP_LTZ(3) METADATA FROM 'timestamp',
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    'topic' = 'orders',
    'properties.bootstrap.servers' = 'broker:9092',
    'format' = 'json'
);

Connector options live in DDL. Source definition, watermarking, and format selection stay in one place.

Enrichment lookup joins

Enrichment lookup joinssql

SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c
    ON o.customer_id = c.id;

That is enrichment without a separate enrichment service.

Aggregation over event time

Aggregation over event timesql

SELECT
    window_start,
    window_end,
    customer_id,
    SUM(total) AS revenue_5m
FROM TABLE(
    TUMBLE(TABLE Orders, DESCRIPTOR(event_time), INTERVAL '5' MINUTES)
)
GROUP BY window_start, window_end, customer_id;

That is a streaming metric, updated in real-time, not a nightly job.

ML inference in SQL

ML inference in SQLsql

/** MODEL DDL **/
CREATE MODEL fraud_model
INPUT (total DECIMAL(12, 2), revenue_1h DECIMAL(12, 2), txn_count_5m BIGINT)
OUTPUT (prediction STRING)
WITH (
    'provider' = 'openai',
    'task' = 'completions',
    'system_prompt' = 'Predict whether the transaction is fraudulent based on total, revenue_1h, and txn_count_5m. Return only the prediction.'
);


/** REAL-TIME MODEL CALL **/
SELECT *
FROM ML_PREDICT(
    TABLE FeatureStream,
    MODEL fraud_model,
    DESCRIPTOR(total, revenue_1h, txn_count_5m)
);

An LLM service scoring inside the same query surface as the rest of the pipeline.

CEP - Complex Event Processing

SELECT *
FROM Orders
MATCH_RECOGNIZE (
    PARTITION BY customer_id
    ORDER BY event_time
    MEASURES
        A.order_id AS first_order,
        C.order_id AS third_order,
        C.total - A.total AS escalation
    PATTERN (A B C) WITHIN INTERVAL '10' MINUTES
    DEFINE
        B AS B.total > A.total,
        C AS C.total > B.total
);

Detect three escalating orders from the same customer within 10 minutes in real time.

Developer Experience with Ververica Platform

SQL unlocks a new paradigm that spans from application development to running jobs in production. Ververica Platform was built to deliver enterprise-grade readiness for running Apache Flink jobs in highly demanding organizations. On top of the open-source code, it provides a unique development environment for SQL, allowing developers to build fast and securely, and SREs to run and monitor Flink applications.

Debug SQL against real data

Build, then verify. Developers can validate the logic against live streaming data before hitting deployment. The SQL Editor, an IDE for real-time data, consumes actual events, watermarks, state, and outputs right in the editor. Everything is understandable and transparent, reaching total correctness, confirmed at the source

Catalog-native development

The built-in VVP Catalog ships with the platform. Packaged catalogs cover Apache Hive Metastore, JDBC, Schema Registry, Iceberg, and Fluss, among others. Ververica consumes metadata from the most common Catalogs and exposes it for testing and production environments. Every registered table is referenceable by name across catalogs and databases. Schemas, formats, and connection settings are resolved automatically, not in the application code, setting a clear separation between who manages the metadata and who manages the business logic.

Code assistance built for streaming SQL

The SQL Editor offers schema-aware auto-completion and surfaces errors inline. It suggests columns directly from the catalog and provides helpful hints for time attributes and watermarks where necessary. Furthermore, the editor is intelligent enough to distinguish between regular and temporal joins and to provide prompts accordingly. Code assistance improves speed and correctness for Flink jobs.

Lineage

Ververica provides an automated, transversal lineage feature that inherently captures the data flow of every SQL job, transforming lineage from a manual documentation task into a system property. Through an interactive lineage explorer, users can visualize data dependencies down to the column level, facilitating rigorous governance, compliance auditing, and rapid impact analysis. Visual graph elements, including connector information, modification timestamps, and author details, allow users to jump directly to specific catalog objects or deployments, thereby streamlining root cause analysis and troubleshooting.

The point is not that SQL on Flink is possible on Ververica. SQL on Flink has been possible for years. Ververica makes it the path of least resistance: for the engineer writing the query, the team reviewing it, and the platform running it in production.

When SQL Makes Sense

Flink SQL excels in declarative, standards-based stream and batch processing where logic is defined as continuous queries on dynamic tables. Primary applications include real-time aggregations, data enrichment through lookups or temporal joins, and ETL pipelines that utilize robust connectors for repositories such as Kafka, JDBC, Paimon, and standard file systems. Removing the requirement for Java or Python code effectively lowers the entry barrier for analysts and data engineers.

Workloads where Flink SQL is a good fit:

Streaming ETL and continuous transformations: reading from Kafka, CDC sources, or file systems and writing to downstream sinks with declarative SQL logic (filters, projections, joins, aggregations).
Real-time aggregations and dashboards feeding: windowed aggregations (tumble, hop, session, cumulate via Window TVF), GROUP BY with mini-batch optimization, and TopN computations.
Data enrichment via joins: temporal joins against versioned dimension tables, lookup joins against external databases (JDBC, MongoDB, Redis), and interval joins between two streams.
Bounded batch jobs on a schedule: periodic ETL, historical backfills, and reconciliation runs where the same SQL runs in BATCH mode on VVP 3 with the Workflow Scheduler.
Democratizing access to streaming: enabling data engineers, analysts, and domain experts who know SQL but not Java/Python to build and operate streaming pipelines without a JVM development cycle.
Multi-system integration with catalogs: leveraging Kafka Schema Registry, Paimon, Fluss, Hive, PostgreSQL, Oracle, or Iceberg catalogs for zero-DDL table discovery and cross-system pipelines.
Pattern detection with MATCH_RECOGNIZE: CEP-style event pattern matching expressible in SQL (e.g., fraud sequences, session detection).
Rapid prototyping and iteration: using the SQL Editor with session clusters for preview, validation, and quick feedback loops before promoting to production.

For workloads requiring complex custom state management, advanced event patterns beyond those covered by MATCH_RECOGNIZE, or fine-grained control over parallelism and resource isolation, the DataStream API remains the more appropriate choice.

The Shift

Streaming spent most of its history as a specialist discipline. Toolchains were powerful, exclusive, and concentrated within small teams, becoming permanent bottlenecks between event producers and consumers and widening the gap.

SQL on streams, anchored to a shared catalog and metadata, dissolves the bottleneck. Pipelines become readable, reviewable, and evolvable artifacts. Flink remains a serious runtime as ever. Governance and lineage stop being a “nice-to-have” and become byproducts of the platform's wiring.

The three letters did not change. What they mean did.

SQL is now the Streaming Query Language for businesses running in low-latency mode.

Ready to Build Production-Grade Flink SQL Pipelines?

This article covered the basics. Our free ebook walks through the complete journey from interactive SQL development to production deployments, including dynamic tables, watermarks, temporal joins, savepoints, autoscaling, and governance.

[Download the Ebook →]

Additional Resources

Start your Flink SQL journey today.

Join our hands-on X-Stream Labs workshops

The Dataflow Model

Share:LinkedIn

SQL now stands for Streaming Query Language

Declarative Is The Unlock

Agility, By Default

The Optimizer Carries The Weight

Catalogs and SQL, The Winning Combination

Practical Flink SQL Patterns

Connector configuration in SQL

Enrichment lookup joins

Aggregation over event time

ML inference in SQL

CEP - Complex Event Processing

Developer Experience with Ververica Platform

When SQL Makes Sense

The Shift

Ready to Build Production-Grade Flink SQL Pipelines?

Additional Resources

Why Dashboards Keep Missing What Matters

The Sovereignty Tax. What Cloud-Only Vendors Won't Tell Tier 1 Banks

While the World Buffers, We Act.

Declarative Is The Unlock

Agility, By Default

The Optimizer Carries The Weight

Catalogs and SQL, The Winning Combination

Practical Flink SQL Patterns

Connector configuration in SQL

Enrichment lookup joins

Aggregation over event time

ML inference in SQL

CEP - Complex Event Processing

Developer Experience with Ververica Platform

When SQL Makes Sense

The Shift

Ready to Build Production-Grade Flink SQL Pipelines?

Additional Resources

Continue reading

Why Dashboards Keep Missing What Matters

The Sovereignty Tax. What Cloud-Only Vendors Won't Tell Tier 1 Banks

While the World Buffers, We Act.