Sinha-Sign is an end-to-end computer vision system for real-time Sinhala Sign Language recognition. It detects the hand in a webcam frame or uploaded image, classifies the gesture with a fine-tuned InceptionV3 model, and instantly returns:
| Output | Example |
|---|---|
| ✋ Sign label (English) | three |
| 🔤 Sinhala script | තුන |
| 🗣️ Phonetic pronunciation | thuna |
| 🔊 Spoken audio | .mp3 via gTTS |
Everything — dataset, training, inference, and the Gradio UI — runs inside Google Colab. No local GPU or install required.
┌─────────────────────────────────────────────────────────────────────┐
│ TRAINING │
│ │
│ 📦 Kaggle Dataset ──► Sinhala Sign Language (multi-class) │
│ │ │
│ ▼ │
│ 🧠 InceptionV3 (ImageNet pretrained · TF Hub) │
│ │ Top layer retrained · 1000 steps · lr=0.01 │
│ ▼ │
│ 💾 retrained_graph.pb + retrained_labels.txt │
│ │ │
│ ▼ │
│ 📊 Accuracy & Loss curves plotted per training step │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ INFERENCE │
│ │
│ 📷 Webcam / Upload │
│ │ │
│ ▼ │
│ 🖐️ Hand Detection ── OpenCV DNN · ResNet-10 SSD (Caffe) │
│ │ confidence > 0.5 → crop + 20 px padding │
│ ▼ │
│ 🔬 Preprocess ── resize 299×299 · normalise [0, 1] │
│ │ │
│ ▼ │
│ 🧠 InceptionV3 Classifier ── softmax over N sign classes │
│ │ confidence < 15 % → flagged low confidence │
│ ▼ │
│ 🌐 CSV Lookup ── english-sinhala-script.csv │
│ sinhala-textscript.csv │
│ │ │
│ ▼ │
│ 🔊 gTTS ── spoken English + Sinhala phonetic → .mp3 │
│ │ │
│ ▼ │
│ 🖥️ Gradio UI ── label · script · phonetic · confidence · audio │
└─────────────────────────────────────────────────────────────────────┘
The model was trained for 1 000 steps using InceptionV3 transfer learning on the Sinhala sign dataset. Training and validation accuracy both converge above ~93–95%, while cross-entropy loss drops from 0.85 → 0.13.
| Metric | Value |
|---|---|
| Final train accuracy | ~95% |
| Final validation accuracy | ~93% |
| Final cross-entropy loss | ~0.13 |
| Training steps | 1 000 |
| Learning rate | 0.01 |
| Batch size | 32 |
| Input resolution | 299 × 299 |
| Inference confidence cutoff | 15% |
Both curves track closely throughout training with no signs of significant overfitting — the model generalises well across unseen signs.
☁️ No local setup. Open the notebook directly in Google Colab.
Upload sstst_sinhala_sign_translator.ipynb to colab.research.google.com, then set:
Runtime → Change runtime type → Hardware accelerator → T4 GPU
from google.colab import files
files.upload() # select your kaggle.json
import os
os.makedirs('/root/.kaggle', exist_ok=True)
!cp kaggle.json /root/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json| Cell | Action |
|---|---|
Cell 0 |
Install dependencies — tensorflow, gradio, gTTS, imutils … |
Cell 1 |
Download & extract the Sinhala sign dataset from Kaggle |
Cell 2–3 |
Fine-tune InceptionV3 — ~5–10 min on T4 GPU |
Cell 4–5 |
Install gTTS · verify output files |
Cell 6–9 |
Load models, hand detector, and Sinhala CSV mappings |
Cell 8 |
Plot training accuracy & loss curves |
Cell 10–12 |
Launch the Gradio web UI (generates a public shareable link) |
Cell 13–14 |
(Optional) Test on a static uploaded image |
Once the Gradio UI launches, Colab prints a public link. Open it, go to the Translator tab, allow camera access, hold up a hand sign, and click Capture & Translate.
The UI has three tabs accessible from the shareable Colab link:
╔══════════════════════════════════════════════════════════════════╗
║ 🤟 Sinhala Sign Language Translator ║
╠══════════════════╦═══════════════════╦═══════════════════════════╣
║ Translator ║ Training Graphs ║ Dataset Info ║
╠══════════════════╩═══════════════════╩═══════════════════════════╣
║ ║
║ [ Webcam / Upload ] [ 🔍 Detected hand ] ║
║ ║
║ [ Capture & Translate ▶ ] Sign (EN) | සිංහල text ║
║ Phonetic pronunciation ║
║ ████████░░ Confidence: 91% ║
║ 🔊 Speech output (.mp3) ║
║ ║
║ ▾ Supported signs one · two · three · what · when … ║
╠══════════════════════════════════════════════════════════════════╣
║ Training Graphs → Accuracy & loss curves · live reload ║
║ Dataset Info → Per-class image count & distribution chart ║
╚══════════════════════════════════════════════════════════════════╝
Numbers
one |
two |
three |
four |
five |
six |
|---|---|---|---|---|---|
seven |
eight |
nine |
ten |
eleven |
thirteen |
fourteen |
twenty |
thirty |
fifty |
Question Words
what |
when |
who |
why |
|---|
Each class maps to its Sinhala script and phonetic romanisation via CSV lookup tables bundled with the project.
Input Image
│
▼
OpenCV DNN Hand Detector (ResNet-10 SSD · Caffe · confidence ≥ 0.5)
│
│ crop → resize 299×299 → normalise [0, 1]
▼
InceptionV3 Feature Extractor (ImageNet pretrained · TF Hub · frozen)
│
▼
Retrained Classification Head (Dense · Softmax · N classes)
│
├──► english-sinhala-script.csv → Sinhala script
└──► sinhala-textscript.csv → Phonetic · gTTS audio
| Library | Role |
|---|---|
tensorflow / tf.compat.v1 |
InceptionV3 transfer learning, frozen graph inference |
tensorflow-hub |
Pretrained InceptionV3 feature extractor |
opencv-python |
Hand detection (DNN SSD), bounding boxes, image I/O |
Pillow |
Image loading and preprocessing |
gradio |
Interactive web UI — webcam, upload, tabs, audio player |
gTTS |
Google Text-to-Speech audio generation |
numpy |
Array ops and tensor manipulation |
matplotlib |
Training curves and dataset distribution charts |
kaggle |
Dataset download via API |
imutils |
Image resize utilities |
- 📸 Good lighting — even, shadow-free illumination improves detection
- 🤚 Plain background — high contrast between hand and background helps
- 🎯 Centre your hand — keep the sign clearly within frame
- 🧘 Hold still — steady the sign for 1–2 seconds before capturing
- 🔁 Retake if confidence < 50% — a slightly different angle often helps
Sri Lanka is home to an estimated ~50,000 Deaf and hard-of-hearing individuals. Despite a rich and distinct Sinhala Sign Language, digital tools for automatic recognition remain scarce. Sinha-Sign is a step toward making technology more accessible for the Sinhala-speaking Deaf community — bridging gesture to script and speech with open, reproducible deep learning.
Released under the MIT License. The Sinhala Sign Language dataset is sourced from Kaggle and is subject to its original data licence.