feat(cv): gaze detection (MediaPipe head pose)#14
Merged
Conversation
Add the gaze detector for the CV worker: given a webcam frame, decide whether
the user is looking at the screen.
- gaze_detector.py: detect_gaze(frame) -> protocol event (focused/distracted);
analyze_gaze() for richer facts; calibrate()/reset_reference()
- Head pose (MediaPipe FaceLandmarker, Apache-2.0, local) — robust vs lighting,
glasses, distance; no torch/AGPL
- Auto-calibration: first face seen becomes the reference pose, later frames
judged as +/- deviation — works with any camera angle
- Multi-person: detects up to NUM_FACES, locks onto the intended user and
tracks them by position so others entering frame don't steal the signal
- gaze_detect_test.py: visual webcam test (FOCUSED/DISTRACTED + angles)
- setup.sh fetches face_landmarker.task; python/README documents it
Perception only ("looking at screen now?"); the look-away timer is policy for
the loop/state layer.
lukitasxue
approved these changes
Jun 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the gaze detector for the Python CV worker: given a webcam frame, decide whether the user is looking at the screen (
focused) or away (distracted). Verified live — clean focused/distracted flips, calibration, and multi-person tracking.What's in here
python/cv/gaze_detector.py— the detectordetect_gaze(frame)→ protocol event{type:"gaze", status, confidence, timestamp}analyze_gaze(frame)→ richer facts (angles, offsets, face count) for the test UIcalibrate(frame)/reset_reference()→ manage the reference posepython/cv/gaze_detect_test.py— visual webcam test (FOCUSED/DISTRACTED + head angles + face count)setup.sh— now also downloadsface_landmarker.taskpython/README.md— documents the gaze detectorApproach — reasoning
Head pose, not eye/iris gaze. We judge "looking at screen" by which way the face points (yaw/pitch from MediaPipe's facial transformation matrix), not where the eyes point. Head pose is far more robust to lighting, glasses, and distance — important for a tool that runs all day on any laptop.
Auto-calibration → camera-position independent. Head-pose angles are measured relative to the camera, so a side-mounted camera would make "facing the screen" read as a constant offset (and falsely flag distracted). The first frame with a face becomes the reference (0/0); every later frame is judged as +/- deviation from that pose. Works at any camera angle, no manual step.
calibrate()can re-baseline on demand.Multi-person tracking. Detects up to
NUM_FACES(3), locks onto the intended user (biggest/closest face = person at their own screen), then follows them by position each frame — so other people entering the frame don't steal the signal. If the tracked face leaves, that's treated as "no face".License / shipping (consistent with the phone detector)
Design
detect_gazeanswers "looking at screen right now?". Turning that into a distracted state (looked away for N seconds — the configurable look-away timer) is policy for the loop/state layer, deliberately kept out of the detector.phone_detector.py: lazy-loaded model, protocol-shaped events, same structure. Emits exactly whatpython-bridge.tswill consume.Status states
focuseddistracteddistractedModels / artifacts
models/face_landmarker.task(~3.6 MB) is gitignored and fetched bysetup.sh.Testing
None-frame and identity-matrix sanity checks pass; Pyright clean.Follow-ups (not in this PR)
detection_loop.pyalongside phone