This project uses a Bidirectional LSTM (BiLSTM) model to classify desktop activities (e.g., Read, Browse, Debug, Watch, Write) from eye-tracking time-series data.
We build a preprocessing pipeline to derive kinematic features, create sliding windows, and train a neural network to predict activity labels.
We use the Eye Movement Dataset for Desktop Activities.
- Participants: 24
- Activities: 8
- Browse, Debug, Interpret, Play, Read, Search, Watch, Write
- Files: 192 CSVs (Participant Γ Activity)
- Columns:
participant(ID)set(A/B)activity(class label)x, y(gaze coordinates in pixels)timestamp(ms)
- Derived kinematic features:
vx, vy= gaze velocity (pixels/sec)speed= magnitude of eye movementax, ay= gaze acceleration
- Sliding windows:
- Length = 200 samples
- Stride = 50
- Labels: majority activity label per window
- Train/validation split by participant (to prevent identity leakage).
The classifier processes sequential gaze data and predicts one of 8 activities.
- Input:
[batch, 200 timesteps, 7 features]- Features:
x, y, vx, vy, speed, ax, ay
- Features:
- BiLSTM Layer
- Hidden size = 128 per direction β 256 output dims per timestep
- Output shape:
[batch, 200, 256]
- Pooling
- Mean pooling across timesteps β
[batch, 256]
- Mean pooling across timesteps β
- Fully Connected Head
Linear(256 β 256) β ReLU β DropoutLinear(256 β 8)β logits
- Softmax + CrossEntropy Loss
- Loss: CrossEntropyLoss (with optional class weights to handle imbalance)
- Optimizer: AdamW (lr=3e-4, weight_decay=1e-4)
- Scheduler: CosineAnnealingLR
- Regularization: Dropout (0.2), gradient clipping (1.0)
- Early Stopping: patience = 6 epochs (monitored on Macro-F1)
- Batch Size: 64
- Epochs: 30
-
Accuracy:
-
Macro F1-Score:
Ensures fair performance across all classes, even if imbalanced.
-
Confusion Matrix: