A scalable distributed image processing system that uses Apache Kafka for message queuing, Redis for state management, and Flask for the web interface. The system splits images into tiles, processes them in parallel using distributed workers, and stitches them back together.
The system consists of four main components:
- Flask Web Application (
app.py) - Handles image uploads and job management - Worker Nodes (
worker.py) - Process image tiles and apply grayscale filters - Results Service (
results_service.py) - Collects processed tiles and stitches final images - Monitoring Service (
monitoring_service.py) - Tracks worker health via heartbeats
- Apache Kafka (installed at
/opt/kafka) - Apache ZooKeeper (comes with Kafka)
- Redis Server
- Python 3.8+
- pip (Python package manager)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtcd /opt/kafka
bin/zookeeper-server-start.sh ./config/zookeeper.propertiescd /opt/kafka
sudo bin/kafka-server-start.sh config/server.propertiescd /opt/kafka
# Create tasks topic (for distributing image tiles to workers)
bin/kafka-topics.sh --create --topic tasks --bootstrap-server localhost:9092 --partitions 2 --replication-factor 1
# Create results topic (for collecting processed tiles)
bin/kafka-topics.sh --create --topic results --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
# Create heartbeats topic (for worker health monitoring)
bin/kafka-topics.sh --create --topic heartbeats --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1Topic Explanation:
tasks- 2 partitions for parallel distribution of image processing tasksresults- Single partition to maintain tile order during collectionheartbeats- Worker status updates every 5 seconds
redis-serverOr if Redis is already running as a service:
brew services start redis # macOS with Homebrew
# or
sudo systemctl start redis # Linux with systemdUpdate the following IP addresses in the Python files to match your setup:
In app.py:
BOOTSTRAP_SERVERS = '172.27.247.209:9092' # Update to your Kafka broker IPIn worker.py:
BOOTSTRAP_SERVERS = '172.27.247.209:9092' # Update to your Kafka broker IPIn results_service.py:
BOOTSTRAP_SERVERS = '172.27.247.209:9092' # Update to your Kafka broker IP
REDIS_HOST = 'localhost' # Update if Redis is on a different hostIn monitoring_service.py:
BOOTSTRAP_SERVERS = '172.27.247.209:9092' # Update to your Kafka broker IP
REDIS_HOST = '172.27.111.128' # Update to your Redis server IPpython results_service.pyThis service listens for processed tiles and stitches them into final images.
python monitoring_service.pyThis service tracks worker health and updates Redis with worker status.
python worker.pyStart multiple workers for parallel processing. Each worker will automatically generate a unique ID.
To manually set a worker ID:
WORKER_ID=worker-1 python worker.pypython app.pyThe web interface will be available at http://localhost:5001
- Open your browser and navigate to
http://localhost:5001 - Click "Select Image File" and choose a PNG or JPG image
- Click "Process Image" to start the distributed processing
- Monitor the progress in real-time:
- Worker Status - Shows all active workers and their health
- Job Progress - Displays tile processing progress
- Final Result - Shows the processed grayscale image when complete
Tile Size (in app.py):
tile_size = 512 # Adjust tile size for different performance characteristics- Larger tiles = fewer messages, less overhead
- Smaller tiles = more parallelism, better load distribution
In worker.py:
time.sleep(5) # Worker sends heartbeat every 5 secondsIn monitoring_service.py:
WORKER_TTL_SECONDS = 15 # Worker considered dead after 15 seconds without heartbeat.
โโโ app.py # Flask web application
โโโ worker.py # Distributed worker nodes
โโโ results_service.py # Results collection and image stitching
โโโ monitoring_service.py # Worker health monitoring
โโโ requirements.txt # Python dependencies
โโโ LICENSE # Project license
โโโ README.md # This file (which links to assets/architecture.png)
โโโ assets/
โ โโโ architecture.png
โโโ templates/
โ โโโ index.html # Web interface
โโโ processed/ # Temporary processed tiles (auto-created)
โ โโโ <job-id>/
โ โโโ tile_*.jpg
โโโ final/ # Final stitched images (auto-created)
โโโ <job-id>_complete.jpg
This project is designed to run on a distributed, 4-node (PC) cluster. Here is the recommended mapping of services to systems:
redis-server (The central Redis database)
python app.py (The Flask web application)
zookeeper-server-start.sh (Kafka's ZooKeeper)
kafka-server-start.sh (The Kafka Broker)
python results_service.py (The results collector & image stitcher)
python worker.py (An instance of the processing worker)
python worker.py (A second instance of the processing worker)
python monitoring_service.py (The worker heartbeat monitor)
Note: All systems must be on the same network (e.g., connected via ZeroTier or on the same LAN) and all IP addresses in the scripts must be updated to point to the correct system's IP.
This 4-node setup is just an example. The architecture is horizontally scalable.
You can add more worker.py instances on new machines at any time. The Kafka consumer group (image-processor-group) will automatically discover and load-balance tasks to them.
Important: To scale beyond 2 workers, you must increase the partition count on the tasks topic. The number of partitions is the maximum number of parallel consumers you can have. If you want 10 workers, you must re-create the tasks topic with at least 10 partitions.
