ASL GISLR — PyTorch Baseline (Transformer) + Gemini Sentence Builder

This is a PyTorch implementation for isolated ASL word recognition on Kaggle's GISLR dataset, including an inference demo that turns a stream of recognized words into a simple sentence via Google Generative AI (Gemini).

Highlights

Landmark-based pipeline (MediaPipe schema: 543 points).
Preprocess step that: (1) selects lips + dominant hand + small pose subset, (2) flips coordinates to a left-dominant canonical form, (3) filters frames with no hand, (4) downsamples/pads to 64 frames using edge padding + uniform average pooling.
Transformer encoder (2 blocks, 8 heads, 384 dim) with GELU MLP.
Label smoothing (0.25), AdamW with cosine schedule and optional warmup, weight decay tied to LR (wd = wd_ratio * lr).
Balanced per-class sampling during training.
Webcam demo using MediaPipe Holistic and Gemini to re-order recognized words into a simple sentence.

Quickstart

# 0) Create env (Python >=3.10 recommended)
python -m venv .venv && source .venv/bin/activate  # (Linux/Mac)
# or: .venv\Scripts\activate (Windows)

# 1) Install deps
pip install -r requirements.txt
(Be sure to install the torch version that matches your NVIDIA Driver.)

# 3) Train
python scripts/train.py 

# 5) Webcam inference (+ Gemini sentence)
# Requires GOOGLE_API_KEY in a .env file
python inference/webcam_demo.py --checkpoint ./checkpoints/best.pt

See scripts/config.py for all hyperparameters.

Usage

Video demo is available at: Youtube

Detailed reports on data preprocessing/postprocessing, model architecture, and system are available at: Drive

Post-training models and data for Pipeline Reverse Translation (Sign Language Production) are available at: Drive

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app_optimized.py		app_optimized.py
mean_std.json		mean_std.json
requirements.txt		requirements.txt
test_multi.gif		test_multi.gif
vis_gislr_8frames.png		vis_gislr_8frames.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASL GISLR — PyTorch Baseline (Transformer) + Gemini Sentence Builder

Highlights

Quickstart

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASL GISLR — PyTorch Baseline (Transformer) + Gemini Sentence Builder

Highlights

Quickstart

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages