Unified Object Detection and Gaze Intersection Tracker for Cognitive Sciences Research
MindSight is an open-source toolkit that detects where people look in video, images, and live camera feeds, then maps those gaze vectors onto detected objects to identify social-cognitive phenomena such as joint attention, mutual gaze, and gaze following -- all from a single configurable pipeline.
!!! warning "Beta Notice"
MindSight v0.4.0-beta is currently in **beta**. APIs, configuration formats, and output schemas may change between releases. Please pin your version and check the [Changelog](changelog.md) before upgrading.
MindSight processes each frame through a four-stage pipeline:
flowchart LR
A["Input\nCamera / Video / Image"] --> B["Object Detection\n(YOLO / YOLOE)"]
A --> C["Face Detection\n(RetinaFace)"]
B --> D["Object\nBounding Boxes"]
C --> E["Face\nBounding Boxes"]
E --> F["Gaze Estimation\n(MGaze / L2CS / UniGaze / Gazelle)"]
F --> G["Pitch + Yaw\nper Face"]
D --> H["Ray-BBox\nIntersection"]
G --> H
H --> I["Hit List"]
I --> J["Phenomena Detection\n(JA, Mutual Gaze, etc.)"]
J --> K["Data Collection\n(CSV, Heatmaps, Dashboard)"]
- Object & Face Detection -- locates people, faces, and objects of interest in every frame.
- Gaze Estimation -- predicts a 3-D gaze direction (pitch and yaw) for each detected face.
- Ray-BBox Intersection -- casts each gaze ray and determines which bounding boxes it hits.
- Phenomena & Data Collection -- classifies social-gaze events and writes structured output.
- Frame-by-frame gaze-to-object intersection via ray casting
- Swappable object-detection backends (YOLO, YOLOE with visual prompts)
- Four swappable gaze-estimation backends (MGaze, L2CS-Net, UniGaze, Gazelle)
- Face anonymization for privacy-sensitive recordings
- Auxiliary video stream support for multi-camera setups
- CLI and GUI interfaces for flexible workflows
- YAML-driven pipeline configuration
- Joint Attention -- two or more people attending to the same object
- Mutual Gaze -- two people looking at each other
- Social Referencing -- gaze shifts toward a reference person after an event
- Gaze Following -- one person's gaze directing another's
- Gaze Aversion -- active avoidance of eye contact
- Scanpath Analysis -- sequential fixation patterns over time
- Gaze Leadership -- identifying who initiates gaze shifts in a group
- Attention Span -- sustained fixation duration on targets
- Plugin architecture for custom gaze backends, detectors, and phenomena
- Drop-in plugin discovery -- add a folder, register in YAML, run
- Base classes and hooks for every pipeline stage
- Per-frame CSV export with full gaze and detection metadata
- Aggregated heatmap generation over configurable time windows
- Live dashboard with real-time gaze overlay
- Project mode for batch processing of multiple videos
-
I'm a researcher
Get MindSight running, process your first video, and explore the phenomena it can detect.
- Getting Started -- installation and first run
- User Guide -- pipeline configuration and workflows
- Phenomena -- detailed descriptions of each tracked phenomenon
-
I'm a developer
Understand the internals, write plugins, and extend the pipeline.
- Architecture Deep Dive -- how the pipeline fits together
- Plugin System -- extension points and base classes
- Developer Guide -- module references and contribution guidelines