This workshop guides you through real-time data processing with Apache Flink Table API using Kafka and Avro. You’ll learn how to build powerful streaming analytics applications that can process flight data in real-time. Whether you choose to run locally for development or in Confluent Cloud for production, this workshop has you covered!
The workshop is built around a realistic flight data analytics scenario with these components: - Flight Data Generator - Creates a continuous stream of flight events - Reference Data Generator - Provides airlines and airports reference information - Flink Analytics - Processes the data streams using Flink Table API
-
☁️ Docker (preferably OrbStack on macOS) - For containerized services
-
☕ Java 21 (installed via SDKMAN) - For running Flink applications
-
🛠️ Gradle (via built-in wrapper) - For building the project
-
🧰 Make - For simplified command execution
-
Clone the repository:
git clone https://github.com/gAmUssA/flink-for-java-workshop.git cd flink-for-java-workshop -
Check prerequisites:
make check-prereqs
-
Initialize the environment:
# Create configuration directories make config-init # Generate local configuration files make config-local # Start local Kafka and Schema Registry make docker-up
-
Start data generators:
# Generate flight data (Avro format) make run-data-generator-local # In a new terminal, generate reference data (JSON format) make run-ref-generator-local
-
Configure Confluent Cloud credentials:
# Set up Terraform for Confluent Cloud make setup-terraform -
Initialize cloud infrastructure:
# Complete Confluent Cloud setup (init, plan, apply, output) make cc-setup -
Generate cloud configuration:
# Create cloud configuration files make config-cloud -
Start data generators in cloud mode:
# Generate flight data (Avro format) make run-data-generator-cloud # In a new terminal, generate reference data (JSON format) make run-ref-generator-cloud
This workshop includes three main analytics use cases that can be run either locally or in the cloud.
This module monitors real-time flight status updates, allowing you to track flights by status (Scheduled, Departed, Arrived, Delayed, etc.).
Key features: - Real-time flight status monitoring - Status distribution analytics - Delay notifications
Run locally:
make run-sql-status-localRun in the cloud:
make run-sql-status-cloudThis module analyzes flight routes and patterns, helping identify popular destinations and optimal routing.
Key features: - Route popularity ranking - Origin-destination pairs analysis - Geographic distribution of flights
Run locally:
make run-sql-routes-localRun in the cloud:
make run-sql-routes-cloudThis module tracks and analyzes airline delays, helping identify patterns and potential issues.
Key features: - Airline performance tracking - Delay cause analysis - Trend identification
Run locally:
make run-sql-delays-localRun in the cloud:
make run-sql-delays-cloudThis workshop uses a multi-module Gradle project structure:
-
Common modules:
-
common:models- Data models and schemas (Avro) -
common:utils- Utility classes and configuration helpers
-
-
Application modules:
-
flink-data-generator- Generates sample flight data -
data-generator- Generates reference data (airlines/airports) -
flink-table-api- Flink Table API implementation for all use cases
-
The project processes two main types of data:
-
Flight events (Avro format):
-
Flight number, route information, timestamps
-
Status updates (scheduled, departed, arrived, etc.)
-
Delay information when applicable
-
-
Reference data (JSON format):
-
Airlines: Airline code, name, country, etc.
-
Airports: Airport code, name, city, country, coordinates
-
The workshop demonstrates these Flink Table API capabilities:
-
Table creation from Kafka topics
-
Schema management with Confluent Schema Registry
-
Window aggregations (tumbling, sliding, session windows)
-
Joins between streaming and static data
-
UDFs (User-Defined Functions) for custom processing
The application uses configuration files in two environments:
Local Environment (config/local/):
- kafka.properties - Local Kafka connection settings
- tables.properties - Flink table configurations
- topics.properties - Kafka topic mappings
Cloud Environment (config/cloud/):
- kafka.properties - Confluent Cloud connection settings (including authentication)
- tables.properties - Flink table configurations
- topics.properties - Kafka topic mappings
-
Flink Dashboard: http://localhost:8082
-
Schema Registry: http://localhost:8081
-
Kafka: localhost:29092
-
Confluent Cloud Console: https://confluent.cloud
-
Flink Operations: Available through Confluent Cloud Console
-
Docker issues:
-
Check container status with
make docker-ps -
View logs with
make docker-logsormake docker-logs SERVICE=kafka -
Restart services with
make docker-restart
-
-
Configuration issues:
-
Verify configuration files exist with
make config-list -
Regenerate configuration with
make config-local
-
-
Data generation issues:
-
If no data is flowing, restart generators
-
Check Schema Registry for registered schemas
-
-
Authentication issues:
-
Verify API keys in cloud.properties
-
Run
make terraform-outputto regenerate credentials
-
-
Topic access issues:
-
Check ACLs and permissions in Confluent Cloud
-
Verify service account has appropriate roles
-
-
Schema Registry issues:
-
Check compatibility settings
-
Verify authentication for Schema Registry
-
-
Project Repository: https://github.com/gAmUssA/flink-for-java-workshop
-
Apache Flink Documentation: https://flink.apache.org/docs/stable/
-
Confluent Cloud Documentation: https://docs.confluent.io/cloud/current/
-
Flink Table API & SQL: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/overview/