Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Getting Started with Ververica Platform on Amazon AWS


by

It has been a while since our original blog post on how to setup Ververica Platform on Amazon Web Services (AWS) was published. Many aspects have been improved and new features have been added to Ververica Platform since then. It is time for an update and an extension on the topics which were not covered before. This article will guide you through the process of deploying and running Ververica Platform on AWS. You will create an Amazon Elastic Kubernetes Service (EKS) Cluster on AWS, then set up Ververica Platform with a database created with Amazon Relational Database Service (RDS) as the persistence layer and an S3 bucket as the storage provider. At last, you will configure Ververica Platform authentication and authorization with Amazon Cognito.

Prerequisites

Before getting started, please make sure you have the following:

Create an EKS Cluster

Ververica Platform runs on a Kubernetes cluster, the easiest way to get started on AWS is with their managed Kubernetes offering EKS. Our original blog post has shown how to create an EKS cluster with AWS Console and AWS CLI. In this blog post, you will use the relatively new official AWS command line tool eksctl.


$ aws configure
$ eksctl create cluster --name vvp-cluster --nodes 3 --nodes-max 3
$ kubectl get nodes

Wait until the command is finished and the cluster is successfully provisioned, this may take some time. Within this EKS cluster you will create two Kubernetes namespaces: vvp, to run Ververica Platform and flink, to run Flink jobs.


$ kubectl create namespace vvp
$ kubectl create namespace flink

We recommend running Ververica Platform in a separate namespace from your Flink jobs for better isolation between the control plane and data processing applications.

Create a MySQL database with Amazon RDS

Ververica Platform requires a minimal persistence layer to store the platform metadata. Out of the box it will use a SQLite database stored on a persistent volume. Users of Ververica Platform Stream Edition can configure persistence to an external relational database. In a production environment, we recommend using an external database for easy backup and maintenance.

For this installation you will create a MySQL database with Amazon RDS. You can skip this section if you already have a MySQL instance (or another type of relational database supported by Vervierca Platform) running somewhere given that (1) it contains a database named vvp and (2) it is accessible from the EKS cluster.


$ DB_JSON=`aws rds create-db-instance --engine mysql \
   --db-instance-class db.m5.large \
   --db-instance-identifier vvpdb \
   --allocated-storage 10 \
   --db-name vvp \
   --master-username USERNAME \
   --master-user-password PASSWORD`

This command creates a MySQL instance vvpdb, and the database vvp in it. Remember to replace USERNAME and PASSWORD with your desired values. This command returns a JSON output immediately while the database is being created asynchronously in the background. The JSON output is assigned to the variable DB_JSON.

Next, add an inbound rule to the security group of the created MySQL instance such that it is accessible from the created EKS cluster. For simplicity, this example opens the default port 3306 to any source IP addresses:


$ securityGroupID=`echo $DB_JSON \
   | jq -r '.DBInstance.VpcSecurityGroups[0].VpcSecurityGroupId'`
$ aws ec2 authorize-security-group-ingress --group-id $securityGroupID \
  --protocol tcp --port 3306 --cidr 0.0.0.0/0

Create an S3 Bucket

The Universal Blob Storage in Ververica Platform acts as centralized storage for job artifacts, checkpoints, savepoints, and high availability metadata. For Flink SQL jobs, it also holds the job graphs and the JAR files of User-defined Functions. Universal Blob Storage seamlessly supports the native file storage services of all major cloud providers, e.g., S3 on AWS.

Create an S3 bucket:


$ aws s3 mb s3://vvp-bucket

To grant Ververica Platform permissions to access S3, you can extract the node group role of the EKS cluster and attach an S3 access policy to it:


$ IAM=`eksctl get iamidentitymapping --cluster vvp-cluster --output json`
$ nodeGroupRole=`echo $IAM | jq -r '.[0].rolearn' | cut -d/ -f2`
$ aws iam attach-role-policy --role-name $nodeGroupRole \
   --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

For demonstration purposes, this example attaches the policy AmazonS3FullAccess here. This means Ververica Platform will be able to access any S3 buckets in your AWS account. For better security, you can create your own policy and restrict permissions to a specific S3 bucket.

It is recommended to grant S3 access via a role policy instead of using accessKeyId and secretAccessKey, as it eliminates the need to distribute any access keys to Ververica Platform and Flink.

Setup Ververica Platform

Ververica Platform is distributed via a Helm chart which is available in our Helm Charts Repository. To add the repository to helm, run:


$ helm repo add ververica https://charts.ververica.com 

Ververica Platform can now be installed with helm install. But first, lets prepare a few Helm Values files containing our custom configuration.

Platform Metadata

Check the AWS console to make sure the MySQL instance has been created and is active, then create a Values file values-db.yaml:


vvp:
 persistence:
  type: jdbc
  datasource:
   url: jdbc:mariadb://MYSQL-SERVER:3306/vvp
   username: USERNAME
   password: PASSWORD

where USERNAME and PASSWORD are the values you set when you created the MySQL instance, and MYSQL-SERVER is the database endpoint which you can get by clicking the MySQL instance name in the AWS console and checking the Connectivity & security tab or by running the following command:


$ aws rds describe-db-instances --db-instance-identifier vvpdb \ 
  jq -r '.DBInstances[0].Endpoint.Address'
Make sure you use mariadb instead of mysqlin the JDBC URL for MySQL databases because Ververica Platform supports MySQL persistence using the MySQL-compatible MariaDB JDBC connector.
In a production environment, you should secure the database password.

Universal Blob Storage

Create another Values file values-blob.yaml to back Universal Blob Storage by the created S3 bucket:


vvp:
 blobStorage: 
  baseUri: s3://vvp-bucket

Kubernetes Namespace

Finally, create a Values file values-ns.yaml to allow Ververica Platform to create and manage Flink jobs in the flink Kubernetes namespace.


rbac:
 additionalNamespaces: 
  - flink

With this setting, a Kubernetes role with the required rules will be automatically created in the flink Kubernetes namespace and the Ververica Platform service account will be bound to this role.

Now, together with your commercial/trial license file, Ververica Platform can be installed and started in the EKS cluster by running:


$ helm install vvp ververica/ververica-platform --namespace vvp \
    --values values-db.yaml \
    --values values-blob.yaml \
    --values values-ns.yaml \
    --values values-license.yaml

Once all of the three containers in the Ververica Platform pod reach the Running status, Ververica Platform is ready.


$ kubectl get pod -n vvp
NAME                   READY  STATUS  RESTARTS AGE
vvp-ververica-platform-59c69c4c46-7jljt  3/3   Running  0     48s

Deploy a Flink Job with Ververica Platform

Now that Ververica Platform is up and running, it is time to run your first Flink job. Begin by port-forwarding port 80 of the created Kubernetes service vvp-ververica-platform in the vvp Kubernetes namespace to local port 8080:


$ kubectl port-forward service/vvp-ververica-platform 8080:80 -n vvp

you can then access the Ververica Platform Web UI at http://localhost:8080.

Ververica Platform, Deployments, Flink deployment, Flink on Kubernetes

Please refer to the Ververica Platform documentation for other ways of accessing the Web UI.

At the left side of the Web UI is the menu bar (1) where you have access to different functionalities. Like Kubernetes, Ververica Platform also uses Namespaces to manage resources and support multi-tenancy. While serving a similar purpose as Kubernetes Namespace, Ververica Platform Namespace is an independent concept. The top of the menu bar shows that we are in the default namespace (2) of Ververica Platform.

Ververica Platform uses Deployments to manage the lifecycle of Flink jobs. Each deployment is associated with a Deployment Target which specifies the Kubernetes namespace to run the Flink job. Navigate to menu bar: Administration | Deployment Targets (3), click + Add Deployment Target and fill in:

  • Deployment Target Name: flink

  • Kubernetes Namespace: flink

With this deployment target, your deployment will be able to run in the flink Kubernetes namespace of the underlying EKS cluster.

To create a deployment, navigate to menu bar: Deployments | Deployments, click + Create Deployment (4), switch to the Standard editor (5) and fill in the following fields, then click Create Deployment (6):

  • Deployment Name: TopSpeedWindowing

  • Deployment Target: flink

  • JAR URI: drag & drop your job JAR file to this field, or use this example job that is distributed via Flink’s maven repository

You can also upload JAR files to S3 directly via the S3 console or AWS CLI. See the Ververica Platform documentation which specifies the exact path where you should place them.

Ververica Platform, Flink on Kubernetes, Flink Deployment

After the deployment is created, you can start it by clicking the Start button (7) and watch the Events tab (8) to see what is happening. Its status should eventually change to RUNNING (9). Now is a great time to explore the various operations and information available in Ververica Platform. For example, you can click the Flink UI button (10) to access the job’s Flink UI. The buttons Metrics and Logs will remain inactive because they have not been configured. The Ververica Platform documentation and the Ververica Platform playground offers details and examples of such configurations.

TopSpeedWindowing, Ververica Platform, Flink Deployments

You can reconfigure the deployment by clicking Configure (11). Look around in the Standard/Advanced/YAML (5) editor to see what other options are available. If some configurations are changed, Ververica Platform will take a savepoint of the Flink job, restart the deployment with the new configuration, and resume the job from the created savepoint. To trigger a savepoint manually, click the Savepoint button (12).

To run Flink SQL jobs on Ververica Platform, please refer to our blog post Ververica Platform 2.3: Getting Started with Flink SQL on Ververica Platform.

Setup Authentication with Amazon Cognito

Ververica Platform does not manage user credentials itself, instead, it supports integration with OpenID Connect (OIDC) identity providers (IdP) for authentication and authorization. If you are not familiar with OIDC, this talk provides a pretty good explanation. At a high-level, this means Ververica Platform integrates with your existing identity management system. In the remainder of this section, we will use Amazon Cognito as our IdP and show the steps to integrate it with Ververica Platform. If you use a different IdP, you will need to follow its own procedure.

First, you create a user pool, i.e., the IdP:


$ USERPOOL_JSON=`aws cognito-idp create-user-pool --pool-name vvpusers` 
$ userPoolId=`echo $USERPOOL_JSON | jq -r '.UserPool.Id'`

then store the user pool ID in the variable userPoolId. This variable will be used in all of the following aws cognito-idp commands. Now you create an app client in the user pool:


$ aws cognito-idp create-user-pool-client \
   --user-pool-id $userPoolId \
   --supported-identity-providers COGNITO \
   --client-name vvp --generate-secret \
   --callback-urls http://localhost:8080/login/oauth2/code/vvp \
   --logout-urls http://localhost:8080/logout \
   --allowed-o-auth-flows code \
   --allowed-o-auth-scopes profile openid \
   --allowed-o-auth-flows-user-pool-client

The callback and logout URLs are the URLs of Ververica Platform. Here it is assumed that the Web UI is accessible via port forwarding as described before. If you already have a dedicated DNS entry for accessing Ververica Platform that should be used in place of localhost. Note down the values of UserPoolClient.ClientId and UserPoolClient.ClientSecret, you will need them in a Helm Values file later.

You also need to create an Amazon Cognito domain where Ververica Platform will redirect the login requests to:


$ aws cognito-idp create-user-pool-domain \
   --user-pool-id $userPoolId --domain vvp

This will create the domain:

https://vvp.auth.REGION.amazoncognito.com.

Here REGION is the AWS region you are in.

Next, create two users (vvpadmin and vvpuser) each with the temporary password My-Pass0 and one group (vvpviewer) in the User Pool and add vvpuser into the group vvpviewer.


$ aws cognito-idp admin-create-user --user-pool-id $userPoolId \
   --username vvpadmin --temporary-password My-Pass0
$ aws cognito-idp admin-create-user --user-pool-id $userPoolId \
   --username vvpuser --temporary-password My-Pass0    
$ aws cognito-idp create-group --user-pool-id $userPoolId \
   --group-name vvpviewer 
$ aws cognito-idp admin-add-user-to-group --user-pool-id $userPoolId \
   --username vvpuser --group-name vvpviewer 
In addition to authenticating with the users created directly in the user pool, Amazon Cognito User Pool also supports federation through a third-party identity provider. Check the AWS Cognito documentation for more details

Now that the IdP side is ready, you can configure Ververica Platform by creating the file values-auth.yaml:


vvp:
 auth:
  enabled: true
  admins:
  - user:vvpadmin
  oidc:
   groupsClaim: cognito:groups
   registrationId: vvp
   registration:
    clientId: CLIENT-ID
    clientSecret: CLIENT-SECRET
    redirectUri: "{baseUrl}/{action}/oauth2/code/{registrationId}"
    clientAuthenticationMethod: basic
    authorizationGrantType: authorization_code
    scope:
    - openid
    - profile
   provider:
    issuer-uri: https://cognito-idp.REGION.amazonaws.com/USER-POOL-ID
    userNameAttribute: username
   endSessionEndpoint: "https://vvp.auth.REGION.amazoncognito.com
/logout?client_id=CLIENT-ID&logout_uri=http://localhost:8080/logout"

Do not forget to replace USER-POOL-ID, CLIENT-ID, CLIENT-SECRET and REGION with the actual values you get when creating the user pool and the app client. In the YAML file, we configure user:vvpadmin as an administrator of Ververica Platform who will have the full permissions. The user group membership is passed from Cognito to Ververica Platform via the groupsClaim cognito:groups. redirectUri contains placeholders which will be replaced by Ververica Platform automatically. endSessionEndpoint contains logout_uri which must be the same as the logout URL of the created app client.

In a production environment, you should secure the client secret.

You can now upgrade Ververica Platform with the file values-auth.yaml:


$ helm upgrade vvp ververica/ververica-platform --namespace vvp \
    --values values-db.yaml \
    --values values-blob.yaml \
    --values values-ns.yaml \
    --values values-license.yaml \
    --values values-auth.yaml

After Ververica Platform is restarted, open the Web UI, you will be asked to log in. Fill in the username vvpadmin and the temporary password My-Pass0. Set a permanent password when asked. After login, your username will be shown at the bottom of the menu bar (13).

Add role binding, Ververica Platform

In addition to authentication, you can configure Role-Based Access Control (RBAC) by binding different users to different roles. Apart from the Admin role, Ververica Platform supports three additional roles: Owner, Editor, and Viewer, each granted different permissions. For example, you can bind the user vvpuser to a Viewer role that is not allowed to create any deployment or change anything in Ververica Platform. Log in to the Web UI as vvpadmin (i.e., the administrator), go to the menu bar Administration | Members, delete the owner role binding to system:authenticated, click + Add Member (14) to add the following two role bindings: bind user:vvpadmin to the owner role (15) and group:vvpviewer to the viewer role (16). Then open the Web UI in a private browser window and log in as the user vvpuser. You will see that some buttons and some sub-menu items on the menu bar are grayed out.

Summary

This blog post demonstrates how to install and configure the latest Ververica Platform on AWS. We covered how to create an EKS cluster, how to set up Ververica Platform with a MySQL database as the persistence layer and Universal Blob Storage backed by an S3 bucket. We also described the procedure to configure authentication & authorization with Amazon Cognito.

For more details on these topics or other features of Ververica Platform, please refer to the Ververica Platform documentation. Feel free to contact us or leave comments below for any other questions.

Don’t have an existing Ververica Platform environment? You can still follow along by downloading our free community edition of Ververica Platform HERE.

Ververica Academy

Jun Qin
Article by:

Jun Qin

Find me on:

Comments

Our Latest Blogs

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 09 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More
Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community featured image
by Ververica 03 April 2024

Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community

Ververica has officially donated Flink Change Data Capture (CDC) to the Apache Software Foundation. In this blog, we’ll explore the significance of this milestone, and how it positions Flink CDC as a...
Read More
Announcing the Release of Apache Flink 1.19 featured image
by Lincoln Lee 18 March 2024

Announcing the Release of Apache Flink 1.19

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people...
Read More