Flash-Findr is a high-performance, real-time object detection microservice that leverages YOLO-Everything for zero-shot detection with custom vocabularies. It features a modern web interface with smooth video streaming, and image captioning.
- Open-Vocabulary Detection: Detect any object by simply providing a list of class names
- Real-Time Streaming: Designed to run in real time on CPU
- Image Captioning: Optional scene captioning using vision-language models
- GPU Acceleration: automatic CPU / CUDA runtime support
Flash-Findr/
├── app/
│ ├── api/ # FastAPI endpoints and streaming engine
│ ├── ml_core/ # Computer Vision tools
│ ├── utils/ # Helper utilities
│ └── main.py # Application entry point
├── frontend/ # Web UI (HTML, CSS, JavaScript)
- Python 3.10+
- (Optional) NVIDIA GPU with CUDA support for acceleration
-
Clone the repository
git clone <repository-url> cd Flash-Findr
-
Create a virtual environment
python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python -m app.main
-
Access the web interface
Open your browser and navigate to:
http://localhost:8008
-
Build and run with Docker Compose
docker-compose --profile cpu up --build
-
Access the application
Navigate to:
http://localhost:8008
-
Ensure NVIDIA Docker runtime is installed
# Install nvidia-docker2 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker
-
Build and run with GPU support
docker-compose --profile gpu up --build
-
Access the application
Navigate to:
http://localhost:8008
app/api/engine.py: Video processing and streaming engineapp/ml_core/tools/: Modular vision tools (detection, captioning)app/utils/tracking.py: Kalman Filter implementationfrontend/static/main.js: Client-side UI logic with Konva.js
- Create a new tool class inheriting from
BaseVisionTool - Implement required methods:
_load_model,inference,postprocess - Register in
AVAILABLE_TOOL_TYPESinpipeline.py - Add configuration YAML in
app/ml_core/configs/
Models not loading: Ensure the models/ directory exists and has write permissions
GPU not detected: Verify CUDA installation with nvidia-smi and check Docker GPU runtime
Slow streaming: Reduce detection stride or lower image resolution in tool settings
WebSocket disconnects: Check firewall settings and ensure port 8008 is accessible
See LICENSE file for details.
- Ultralytics YOLO for object detection
- SORT for object tracking
- Konva.js for canvas rendering