Stream Processing & Apache Flink - News and Best Practices

An introduction to ACID guarantees and transaction processing

Written by Markos Sfikas | 09 October 2018

Last month we introduced Ververica Streaming Ledger, our new technology that brings serializable, distributed ACID transactions directly on data streams. In this blog post, we take a step back to introduce ACID semantics and transaction processing and lay the foundations based on which data Artisans Streaming Ledger operates.

Transactions in modern enterprise systems are ubiquitous and necessary for providing data consistency even in highly concurrent environments. But let’s first define the term “transaction” for companies dealing with data applications. A transaction as it relates to the context of a database consists of a collection of read/write operations that are executed only when every single contained operation is successful. The transaction is independently executed for data retrieval or updates and must be treated as one entity: It has to either happen in full or not at all. Whether being a distributed systems engineer or an application developer, one needs to ensure that the system or application transactions are propagated both consistently and atomically to avoid data loss or deviated results.

There is a long-standing debate about transactional processing guarantees in the industry. On the one side, industry experts argue that ACID guarantees (see below) are not necessary on all types of applications and that time gains outweigh any potential losses when ACID guarantees are in place. On the other side of the equation, IT professionals consider strong transactional guarantees the only way for application developers to focus on the application logic and write correct code, in a scalable manner, without the experience of a distributed systems engineer.

What is ACID?

In this section, we give a brief explanation of the ACID terminology as it relates to database transactions. The ACID acronym defines the properties of a database transaction that are characterized by four attributes:

  • Atomicity

Atomicity ensures that all changes made to data are executed as a single entity and operation. This means that the operation succeeds ONLY when all changes are performed.

Using an example from the financial services industry, when executing a transfer of funds between two accounts, the transfer must change the account balance in either both accounts or none to ensure that no funds are “lost” or “created” and that the accounts have the correct balances.

  • Consistency

The consistency attribute ensures that all data changes are executed while maintaining a consistent state between the transaction start and end. This means that the transaction brings the tables, keys or any data type from one consistent state into another consistent state.

Using again the example of the transfer of funds between accounts, the consistency attribute will guarantee that the transfer will only happen if the source account has sufficient funds.

  • Isolation

The isolation attribute certifies that each transaction executes as if it were the only transaction operating on the tables or keys. This means that the intermediate state of a transaction is invisible to other transactions that might be happening concurrently. Databases provide different isolation levels with different guarantees. Ververica Streaming Ledger offers the best class: serializability.

  • Durability

Durability guarantees that once a transaction is completed successfully, it will result in a permanent state change that will remain persistent even in the event of a system failure.

In the example of the transfer of funds between accounts, it is the durability that will ensure that updates made to each account are persistent and can survive a potential system failure. With Ververica Streaming Ledger, durability is ensured in the same way as in other Flink applications — through persistent sources and checkpoints. In the asynchronous nature of stream processing, the durability can only be assumed after a checkpoint.

What is transaction processing and why is it important for the modern enterprise?

Transaction processing has emerged as a necessary technology for modern enterprises dealing with real-time data and real-time applications. It helps in the reduction of errors, the provision of superior customer experience and the reliability of reporting and operations, especially in regulated industries such as the capital markets, investment banking, and financial services industries.

Practically every business dealing with partners, suppliers, staff, stock or customers should think about adopting a transaction processing system in order to become more service-oriented and software-operated.

How Ververica Streaming Ledger came to fruition?

At Ververica, we have witnessed the exponential growth in the adoption of stream processing from the very early days. We always believed that stream processing is the new paradigm in data processing and we expect that with more and more companies adopting such technologies, stream processing frameworks will continue to evolve further in the future.

When partnering with many world-class financial services organizations to develop their stream processing architecture we saw a gap in providing strong ACID guarantees directly on data streams. This was, in fact, one of the reasons that prevented some of those organizations to move more of their mission-critical applications to a streaming infrastructure which made us start thinking about how we could enable distributed serializable ACID transactions on streaming data.

Ververica Streaming Ledger does exactly that: it brings serializable ACID semantics to stream processing and opens the doors of stream processing to a new set of applications that previously relied on relational databases for ACID guarantees. data Artisans Streaming Ledger provides a fast and scalable implementation for transaction processing by storing tables in Apache Flink’s state and using a combination of timestamps, watermarks, and out-of-order scheduling to achieve serializable transaction isolation.

Ververica Streaming Ledger is a testament to the evolution of stream processing as the de facto framework for modern data processing and to Ververica’s continuous effort and dedication to move the technology forward. We encourage you to read more about Streaming Ledger by visiting our product editions or by contacting us for more information or a consultation.