Pravega is a novel storage system that exposes data stream as a first-class abstraction as opposed to objects and files. With Pravega, a stream is a consistently ordered, durable, available and elastic series of data events. Pravega is designed to ingest, store ...
The application of Quantitative Analytics to trades for the generation of Risk and P&L metrics has traditionally followed a batch based approach. Regulatory changes impose increasing demand for compute on financial institutions along with a growing demand for real time ...
Stream processing still evolves and changes at a speed that can make it hard to keep up with the developments. Being at the forefront of stream processing technology, the evolution of Apache Flink has mirrored many of these developments and continues to ...
Microsoft Cloud App Security provides organizations with enterprise grade protection to cloud applications. One of the main capabilities of CAS is the real time detection of threats like compromised accounts, insider threat and ransomware, based on abnormal user activity.
In this ...
Data analytics in its infancy has taken off with the development of SQL. Yet, at web-scale, even simple analytics queries can prove challenging within (Distributed-) Stream Processing environments. Two such examples are Count and Count Distinct. Because of the key-oriented nature of ...
Nowadays many companies become data rich and intensive. They have millions of users generating billions of interactions and events per day. These massive streams of complex events can be processed and reacted upon to e.g. offer new products, next best actions, ...
Deploying Flink jobs while maintaining state requires a number of CLI tasks that need to be performed. As error-prone as that is when done manually, in any serious software project you'll rely on a continuous integration pipeline to automate it. Summing ...
One of the main characteristics of the good streaming pipeline is correctness for event time processing. Real challenges become when such pipeline should be resilient to different types of failures. In this talk, we describe how Criteo runs Flink on one of ...
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data at GO-JEK doesn’t grow linearly with the business, but exponentially, as people start building new products and logging new activities on top of the ...
One of the main sources of concerns when switching to the container paradigm is security. When dealing with big amounts of sensitive customer data it’s very important to be able to guarantee that the data is transported safely between the different ...
For a long time, complex event processing (CEP) and stream analytics have been treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest, enrich, and aggregate high-volume streams. Both ...
Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not ...
One of the big operational challenges when running streaming applications is to cope with varying workloads. Variations, e.g. daily cycles, seasonal spikes or sudden events, require that allocated resources are constantly adapted. Otherwise, service quality deteriorates or money is wasted. Apache ...
Flink’s stateful processing allows enriching the event data with data acquired from previous events. To achieve this, a KeyedStream is used to distribute state and its processing by key. Sometimes, though, an event contains not a single but multiple keys, requiring ...
Failures are inevitable. How can we recover a Flink job from outage? How do we reprocess data from outage period? What are the implications to downstream consumers? These are important questions that we need to answer when running Flink for critical data ...
Containerized deployments have taken the world by storm. Containers make your application portable across different machines and operating systems. They allow to scale applications in a matter of seconds. And they significantly simplify and speed up deployments which decreases development and operating ...
So you're on the hook for millions of transactions per minute. You've already considered all the buzzwords? Blockchain? Machine learning? Microserverless? You're out of buzzwords? Desperate for something that just works? Bonus points for looking indie on Hacker News? ...
SQL is the lingua franca of data processing, and everybody working with data knows SQL. Apache Flink provides SQL support for querying and processing batch and streaming data. Flink's SQL support powers large-scale production systems at Alibaba, Huawei, and Uber. Based ...
In the era of big data and AI, many data-intensive applications, such as streaming, exhibit requirements that cannot be satisfied by traditional batch processing models. In response, distributed stream processing systems, such as Spark Streaming or Apache Flink, exploit the resources of ...
Data is in the very core how Rovio builds and operates its games. What does data mean for Rovio: how its processed and how we gain value from it? In this talk we take a deep dive into Rovio analytics pipeline and ...
Flink's network stack is designed with two goals in mind: (a) having low latency for data passing through, and (b) achieving an optimal throughput. It already achieves a good trade-off between these two but we will continue to tune it further ...
In modern applications of streaming frameworks, stateful streaming is arguably one of the most important usage cases. Flink, as a well-supported streaming framework for stateful streaming, readily helps developers spend less efforts on system deployment and focus more on the business logic. ...
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data Engineering team is responsible to create a reliable data infrastructure across all of GO-JEK’s 18+ products. We use Flink extensively to provide real-time streaming aggregation ...
King's streaming platform processes over hundred billion daily events to provide real-time analytics and personalization capabilities for some of the largest mobile games in the world. Platform’s newest experimental addition uses existing infrastructure, player state, machine learning and global windows ...
Sudden spikes in load can be a source of disaster for stream processors. These spikes can reveal latent bottlenecks in otherwise well-balanced configurations and through them introduce backpressure, increase latency and reduce overall throughput. This problem is far from being solved. While ...
Prometheus is a cloud-native monitoring system prioritizing reliability and simplicity – and Flink works really well with it! This session will show you how to leverage the Flink metrics system together with Pronetheus to improve the observability of your jobs. There will be ...
At Trackunit we have based our telematic IoT processing pipeline on Flink. We started out on version 1.2 and are now on 1.5. In this session I will share the lessons learned going from one giant Flink job to many smalls and some of ...
Python is popular amongst data scientists and engineers for data processing tasks. The big data ecosystem has traditionally been rather JVM centric. Often Java (or Scala) are the only viable option to implement data processing pipelines. That sometimes poses an adoption barrier ...
SK telecom presents how to build and operate a session-based streaming application using Flink. A driving score service essentially calculates a driving score of a user's driving session considering speeding, rapid acceleration and rapid deceleration during the session. At SK telecom, ...
Modern vehicles are capable of producing large volumes of data from dozens of sensors. We will demonstrate the use of map matching to deal with noisy GPS information, and how to develop and deploy real-time sensor data processing applications on Flink using ...
Data is a core part of our infrastructure here at Yelp, with tens of billions of messages per day flowing across our streaming pipelines, empowering us to solve core business problems. To reliably connect and route this massive amount of data across ...
The original intention of Flink is to be a unified computing engine for both streaming and batch. Although now the streaming mode has been widely used and considered as the best streaming solution, the batch processing mode is still under developed. We ...
As a distributed stream processing engine, Flink provides users with convenient operators to manipulate data on the fly. Among all these operators, join could be the most complicated one as it requires the capability to cross-analyze various sources simultaneously. In this talk, ...
You have probably heard that stream processing subsumes batch workloads, a valid but not yet fully implemented claim. Our lab research aims to fulfil this dream and delve further into the deep world of iterative processes, a fundamental building block for graph ...
Authorisation is typically associated with a single act of “logging in”. But nobody likes to login too often, so most websites have a “remember me” option. But it’s not very safe to be constantly logged in. How to accommodate contradictory goals ...
At Intellify we have implemented a system where we can create Flink apps for streaming ETL into normalized datasets in Elasticsearch, with schemas specified in Avro. Our data comes in via a single Kafka topic, but in different shapes depending on the ...
Analysing streams of text data to extract topics is an important task for getting useful
insights to be leveraged in subsequent workflows. For example extracting topics from text to be
continuously ingested into a search engine can be useful to ...
Streaming engines like Apache Flink are redefining ETL and data processing. Data can be extracted, transformed, filtered and written out in real-time with an ease matching that of batch processing. However the real challenge of matching the prowess of batch ETL remains ...
To quote http://www.apache.org/foundation - “The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose ...
Two of the main software architectural trends in software development this decade has been the move to streaming data processing, and the move to microservice architecture.
Both of these architectures are driven by the needs of managing and mining knowledge ...
One Uber's Marketplace team we're tasked with efficiently matching our riders and driver partners in real time. To that end, we we employ various systems within the ride sharing marketplace such as dynamic pricing (popularly known as surge), demand modeling ...
Flink's stateful stream processing engine presents a huge variety of optional features and configuration choices to the user. Figuring out the ""optimal"" choices for any production environment and use-case can therefore often be challenging. In this talk, ...
Flink started with the mission to unify batch and stream processing. We believe that Flink’s architecture is uniquely positioned to be a great engine for streaming, batch and AI workloads at the same time. We will talk about the work we ...
Apache Beam is a unified batch and streaming programming model. Apache Beam runs on various execution backends, such as Apache Flink, Apache Spark, Apache Samza, Apache Gearpump, Apache Hadoop, and Google Cloud Dataflow.
Up until recently, Java was the predominant ...
Stream Processing as helped to turn many monolithic database-centric applications into fast, scalable, and flexible real time applications. However, there are still entire classes of applications that are built against databases, because today's streaming processing model is not yet rich enough ...
Apache Flink streaming applications are typically designed to run indefinitely for long periods of time. As with all long-running services, the applications need to be maintained and upgraded, including improvements to adapt to changing business logic and bug fixes. With this in ...
Distributed tracing is used to analyze performance and error cases in service oriented architectures. The Observability team at Airbnb recently created Upshot, a data pipeline that uses Flink to analyze over 40 million trace events per minute. Summaries of the resulting data are ...
A common and reliable way to buffer streaming data in between Flink pipelines is a pair of Flink Kafka Source and Sink. However, in some low-latency streaming firehouse use-cases this option is not the best choice: a) backlog will quickly accumulate in ...
Mining large streams of real time data has recently grown to one of the key challenges for Big Data community in both, industry and academia. At the same time the concept of Smart City has gained significant acclaim by providing user-oriented services ...
© Copyright 2021 Ververica. Privacy Policy. Imprint. Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.