Skip to content

feat(cv): gaze detection (MediaPipe head pose)#14

Merged
lukitasxue merged 2 commits into
mainfrom
feat/gaze-module
Jun 14, 2026
Merged

feat(cv): gaze detection (MediaPipe head pose)#14
lukitasxue merged 2 commits into
mainfrom
feat/gaze-module

Conversation

@rxv801

@rxv801 rxv801 commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Adds the gaze detector for the Python CV worker: given a webcam frame, decide whether the user is looking at the screen (focused) or away (distracted). Verified live — clean focused/distracted flips, calibration, and multi-person tracking.

What's in here

  • python/cv/gaze_detector.py — the detector
    • detect_gaze(frame) → protocol event {type:"gaze", status, confidence, timestamp}
    • analyze_gaze(frame) → richer facts (angles, offsets, face count) for the test UI
    • calibrate(frame) / reset_reference() → manage the reference pose
  • python/cv/gaze_detect_test.py — visual webcam test (FOCUSED/DISTRACTED + head angles + face count)
  • setup.sh — now also downloads face_landmarker.task
  • python/README.md — documents the gaze detector

Approach — reasoning

Head pose, not eye/iris gaze. We judge "looking at screen" by which way the face points (yaw/pitch from MediaPipe's facial transformation matrix), not where the eyes point. Head pose is far more robust to lighting, glasses, and distance — important for a tool that runs all day on any laptop.

Auto-calibration → camera-position independent. Head-pose angles are measured relative to the camera, so a side-mounted camera would make "facing the screen" read as a constant offset (and falsely flag distracted). The first frame with a face becomes the reference (0/0); every later frame is judged as +/- deviation from that pose. Works at any camera angle, no manual step. calibrate() can re-baseline on demand.

Multi-person tracking. Detects up to NUM_FACES (3), locks onto the intended user (biggest/closest face = person at their own screen), then follows them by position each frame — so other people entering the frame don't steal the signal. If the tracked face leaves, that's treated as "no face".

License / shipping (consistent with the phone detector)

  • MediaPipe FaceLandmarker — Apache-2.0, runs fully local, no PyTorch, no AGPL, no cloud. Safe for a closed/shipped app and the privacy-first design.

Design

  • Perception onlydetect_gaze answers "looking at screen right now?". Turning that into a distracted state (looked away for N seconds — the configurable look-away timer) is policy for the loop/state layer, deliberately kept out of the detector.
  • Mirrors phone_detector.py: lazy-loaded model, protocol-shaped events, same structure. Emits exactly what python-bridge.ts will consume.

Status states

Situation status confidence
Face, looking at screen focused 1.0
Face, turned away distracted 1.0
No face in view distracted 0.0 (lets policy tell "looked away" from "walked away")

Models / artifacts

models/face_landmarker.task (~3.6 MB) is gitignored and fetched by setup.sh.

Testing

  • Verified live: auto-calibrate on first face, focused↔distracted flips at sensible head angles, off-axis camera handled, stays locked on the user when a second face appears.
  • None-frame and identity-matrix sanity checks pass; Pyright clean.

Follow-ups (not in this PR)

  • Wire gaze into detection_loop.py alongside phone
  • Distracted-state policy + configurable look-away timer (loop/state layer)
  • Pass the onboarding-selected camera through to the worker (camera-id integration)
  • Optional: face re-identification (embeddings) for robust identity across exits/returns

rxv801 added 2 commits June 12, 2026 14:58
Add the gaze detector for the CV worker: given a webcam frame, decide whether
the user is looking at the screen.

- gaze_detector.py: detect_gaze(frame) -> protocol event (focused/distracted);
  analyze_gaze() for richer facts; calibrate()/reset_reference()
- Head pose (MediaPipe FaceLandmarker, Apache-2.0, local) — robust vs lighting,
  glasses, distance; no torch/AGPL
- Auto-calibration: first face seen becomes the reference pose, later frames
  judged as +/- deviation — works with any camera angle
- Multi-person: detects up to NUM_FACES, locks onto the intended user and
  tracks them by position so others entering frame don't steal the signal
- gaze_detect_test.py: visual webcam test (FOCUSED/DISTRACTED + angles)
- setup.sh fetches face_landmarker.task; python/README documents it

Perception only ("looking at screen now?"); the look-away timer is policy for
the loop/state layer.
@lukitasxue lukitasxue merged commit d69b62b into main Jun 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants