A distributed analytics platform simulating real-world product analytics pipelines. Ingests user events via Kafka, processes with Spark, and enables real-time analytics. Features Java 21 compatibility with Apache Spark 3.5.0 and comprehensive monitoring tools.
- ✅ Java 21 Compatible - Upgraded to Spark 3.5.0 with full compatibility
- ✅ Real-time Streaming - Kafka → Spark Structured Streaming → JSON/Delta Lake
- ✅ Comprehensive Monitoring - Web UIs, shell scripts, and monitoring guides
- ✅ Docker Orchestration - All services containerized for easy deployment
- ✅ Production Ready - Proper error handling, checkpointing, and resilience
+-----------------+
| User Simulated |
| Event Stream |
+--------+--------+
|
v
+---------------------+
| Apache Kafka |
+----------+----------+
|
v
+-------------------------------------+
| Spark Structured Streaming Job |
| - Java 21 + Spark 3.5.0 |
| - Reads from Kafka |
| - Parses JSON events |
| - Real-time processing |
| - Writes to JSON/Delta Lake |
+------------------+------------------+
|
+----------+-----------+
| |
v v
+--------------------+ +--------------------+
| Delta Lake (S3) | | Apache Druid OLAP |
| (MinIO Storage) | | (Real-time Queries)|
+--------------------+ +--------------------+
- Apache Kafka - Event streaming platform
- Apache Spark 3.5.0 - Stream processing (Java 21 compatible)
- Scala 2.12.18 - Programming language
- Delta Lake - Data lake storage format
- MinIO - S3-compatible object storage
- Apache Druid - Real-time OLAP database
- Docker Compose - Container orchestration
DAP/
├── README.md # This file
├── docker-compose.yml # All services configuration
├── kafka-producer/ # Scala app to generate events
├── spark-job/ # Spark streaming applications
│ ├── src/main/scala/Main.scala # Delta Lake streaming job
│ └── src/main/scala/SimpleMain.scala # JSON streaming job (Java 21 ready)
├── python-consumer/ # Python analysis tools
├── druid/ # Druid configuration
├── monitoring/ # Monitoring guides and scripts
├── quick-monitor.sh # Comprehensive monitoring script
├── restart-and-monitor.sh # Full platform restart script
├── spark-java21-compatibility-guide.md # Java 21 setup guide
└── spark-web-ui-guide.md # Spark monitoring guide
- Docker and Docker Compose
- Java 21 (OpenJDK or Oracle)
- SBT (Scala Build Tool)
# Start the entire platform
docker-compose up -d
# Monitor startup
./quick-monitor.shcd kafka-producer
sbt run # Generates 100 sample eventscd spark-job
# Option 1: JSON output (Java 21 compatible)
sbt 'runMain SimpleMain'
# Option 2: Delta Lake output (when Delta 3.3.0+ becomes available)
sbt 'runMain Main'- Spark Web UI: http://localhost:4040
- Druid Console: http://localhost:8888
- MinIO Console: http://localhost:9001
- Monitoring Script:
./quick-monitor.sh
quick-monitor.sh- Comprehensive health checksrestart-and-monitor.sh- Full platform restartmonitoring/- Detailed monitoring guides- Web UIs for all components
- Kafka message throughput
- Spark streaming batch processing times
- Memory and CPU usage
- Data output locations:
/tmp/kafka-events-json/
This project is fully compatible with Java 21 thanks to:
- Spark 3.5.0 upgrade (from 3.4.1)
- Scala 2.12.18 for better compatibility
- JVM compatibility flags in build.sbt
- Comprehensive testing on macOS with Java 21.0.5
See spark-java21-compatibility-guide.md for detailed setup instructions.
- DirectByteBuffer errors → Use Spark 3.5.0+ with Java 21
- Delta Lake compatibility → Use SimpleMain with JSON output temporarily
- Port conflicts → Check
docker-compose psand restart services - Memory issues → Adjust Docker resource limits
- Check
monitoring/directory for detailed guides - Use monitoring scripts for health checks
- Review Spark Web UI for streaming job details
- ✅ Kafka event streaming
- ✅ Spark 3.5.0 + Java 21 compatibility
- ✅ JSON/Delta Lake storage
- ✅ Docker orchestration
- ✅ Comprehensive monitoring
- Druid real-time ingestion
- Advanced aggregations
- Dashboard integration
- Kubernetes deployment
- Auto-scaling
- Advanced monitoring & alerting
- Ensure Java 21 compatibility in all changes
- Update monitoring guides for new features
- Test with provided monitoring scripts
- Follow established project structure
This project is for educational and demonstration purposes.