Skip to content

The project is focused on developing a technology that facilitates interaction with devices through gaze tracking and facial gesture recognition. This allows users to control the computer without using their hands, which can be very useful in accessibility, entertainment or writing applications, among other options.

Notifications You must be signed in to change notification settings

fa8i/Gaze-Tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Research-Grade Gaze Tracking and Facial Gesture Interaction System

A high-precision, appearance-based gaze tracking system enabling real-time, hands-free human-computer interaction using deep learning, geometric modeling, and personalized calibration.

This project implements a complete end-to-end gaze estimation pipeline, from camera calibration and gaze vector prediction to personalized screen mapping and gesture-based interaction. The system is evaluated across multiple generations of gaze estimation benchmarks, demonstrating strong performance compared to both historical and modern appearance-based gaze estimation models.

The system operates using standard RGB cameras under real-world conditions.


Key Features

• Appearance-based gaze estimation using a custom CNN architecture
• End-to-end gaze tracking pipeline
• Personalized calibration via regression mapping
• Real-time inference and interaction
• Facial gesture recognition for hands-free control
• Gaze-controlled virtual keyboard
• Dynamic gaze heatmap visualization
• Modular and extensible architecture


System Overview

The pipeline consists of four main stages:

  1. Camera Calibration
    Computes intrinsic camera parameters for accurate geometric modeling.

  2. Gaze Vector Estimation
    A convolutional neural network predicts the 3D gaze vector from facial appearance.

  3. Personalized Screen Mapping
    A regression model maps gaze vectors to screen-space coordinates.

  4. Interaction Layer
    Enables real-time hands-free interaction using gaze and facial gestures.

Pipeline summary:

Camera → Face Detection → Eye Extraction → CNN → Gaze Vector → Regression Mapping → Screen Coordinates → Interaction


Model Architecture

The gaze estimation model is a custom convolutional neural network designed for appearance-based gaze prediction under unconstrained real-world conditions.

Architecture visualization:

Model Architecture

The architecture was designed to:

• Extract robust gaze features from eye and facial appearance
• Generalize across users and lighting conditions
• Enable stable and accurate gaze vector prediction

Architecture diagrams were generated using PlotNeuralNet.


Training Convergence and Angular Error

The following figure shows the angular error evolution during training for multiple CNN architectures and configurations evaluated on the MPIIFaceGaze dataset.

Each curve represents a different model configuration. The minimum validation and test errors achieved by each model are highlighted.

Training Convergence

Key observations:

• Best validation angular error: 1.119°
• Best test angular error: 1.190°
• Consistent convergence across multiple architectures
• Stable training behavior and smooth error reduction
• Minimal validation-to-test gap, indicating strong generalization

These results confirm stable convergence and effective gaze representation learning.


Benchmark Evaluation Across Generations of Gaze Estimation

This project evaluates performance across three major generations of appearance-based gaze estimation systems.


First Generation: MPIIGaze (2015)

MPIIGaze established the first realistic benchmark for appearance-based gaze estimation under unconstrained real-world conditions.

MPIIGaze Comparison

This benchmark marked the transition from geometric methods to deep learning-based appearance models.


Second Generation: MPIIFaceGaze (2017)

MPIIFaceGaze introduced full-face gaze estimation and improved performance compared to earlier architectures.

MPIIFaceGaze Comparison

The proposed architecture demonstrates competitive performance compared to established full-face CNN-based gaze estimation models.


Third Generation: Modern Gaze Estimation Systems

This comparison includes modern appearance-based gaze estimation systems such as RT-GENE, Gaze360, and attention-based CNN architectures.

The proposed model is shown in yellow.

State-of-the-Art Comparison

The convergence analysis presented earlier confirms that these results are achieved through stable and reproducible training.


Personalized Calibration Performance

A regression-based calibration stage improves screen-space accuracy by adapting the model to individual users.

Pixel error decreases as calibration sample size increases:

Regression 1

Regression 2

Regression 3

Parity plots:

Regression Parity

This calibration stage enables accurate gaze-based interaction using standard hardware.


Real-Time Interaction

The system supports real-time interaction using gaze and facial gestures.

Example:

Demo


Installation

Install dependencies:

pip install -r requirements.txt

Usage

Camera calibration:

python src/data_collection/camera_calibration.py

Optional training:

python src/train/preprocess_mpii_dataset.py
python src/train/training.py

Personalized calibration:

python src/data_collection/data_collection.py
python src/regressor/gaze_csv.py
python src/regressor/regression.py

Run demo:

python src/demo/main_demo.py

Applications

• Assistive technologies
• Accessibility interfaces
• Human-computer interaction
• Hands-free computer control
• AR/VR interaction
• Attention tracking
• Behavioral analysis


Technical Summary

This project demonstrates a complete appearance-based gaze estimation system combining deep learning, geometric modeling, personalized calibration, and real-time interaction.

The system achieves stable convergence and strong performance across multiple gaze estimation benchmarks.


License

MIT License

About

The project is focused on developing a technology that facilitates interaction with devices through gaze tracking and facial gesture recognition. This allows users to control the computer without using their hands, which can be very useful in accessibility, entertainment or writing applications, among other options.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages