Skip to content

nimshafernando/Sinha-Sign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation


Python TensorFlow InceptionV3 Gradio Colab License


🤟 Show a hand sign. Get Sinhala script, phonetics, and spoken audio — instantly.


Overview  ·  Pipeline  ·  Results  ·  Quickstart  ·  Supported Signs  ·  Tech Stack



🌿 Overview

Sinha-Sign is an end-to-end computer vision system for real-time Sinhala Sign Language recognition. It detects the hand in a webcam frame or uploaded image, classifies the gesture with a fine-tuned InceptionV3 model, and instantly returns:

Output Example
✋ Sign label (English) three
🔤 Sinhala script තුන
🗣️ Phonetic pronunciation thuna
🔊 Spoken audio .mp3 via gTTS

Everything — dataset, training, inference, and the Gradio UI — runs inside Google Colab. No local GPU or install required.


🏗️ Pipeline

┌─────────────────────────────────────────────────────────────────────┐
│                        TRAINING                                     │
│                                                                     │
│  📦  Kaggle Dataset  ──►  Sinhala Sign Language (multi-class)      │
│            │                                                        │
│            ▼                                                        │
│  🧠  InceptionV3  (ImageNet pretrained · TF Hub)                   │
│            │   Top layer retrained · 1000 steps · lr=0.01          │
│            ▼                                                        │
│  💾  retrained_graph.pb  +  retrained_labels.txt                   │
│            │                                                        │
│            ▼                                                        │
│  📊  Accuracy & Loss curves plotted per training step              │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                        INFERENCE                                    │
│                                                                     │
│  📷  Webcam / Upload                                                │
│            │                                                        │
│            ▼                                                        │
│  🖐️  Hand Detection  ──  OpenCV DNN · ResNet-10 SSD (Caffe)       │
│            │   confidence > 0.5  →  crop + 20 px padding           │
│            ▼                                                        │
│  🔬  Preprocess  ──  resize 299×299 · normalise [0, 1]             │
│            │                                                        │
│            ▼                                                        │
│  🧠  InceptionV3 Classifier  ──  softmax over N sign classes       │
│            │   confidence < 15 %  →  flagged low confidence        │
│            ▼                                                        │
│  🌐  CSV Lookup  ──  english-sinhala-script.csv                    │
│                      sinhala-textscript.csv                        │
│            │                                                        │
│            ▼                                                        │
│  🔊  gTTS  ──  spoken English + Sinhala phonetic  →  .mp3          │
│            │                                                        │
│            ▼                                                        │
│  🖥️  Gradio UI  ──  label · script · phonetic · confidence · audio │
└─────────────────────────────────────────────────────────────────────┘

📊 Results

The model was trained for 1 000 steps using InceptionV3 transfer learning on the Sinhala sign dataset. Training and validation accuracy both converge above ~93–95%, while cross-entropy loss drops from 0.85 → 0.13.

Metric Value
Final train accuracy ~95%
Final validation accuracy ~93%
Final cross-entropy loss ~0.13
Training steps 1 000
Learning rate 0.01
Batch size 32
Input resolution 299 × 299
Inference confidence cutoff 15%

Both curves track closely throughout training with no signs of significant overfitting — the model generalises well across unseen signs.


🚀 Quickstart

☁️ No local setup. Open the notebook directly in Google Colab.

1 · Open the notebook

Upload sstst_sinhala_sign_translator.ipynb to colab.research.google.com, then set:

Runtime → Change runtime type → Hardware accelerator → T4 GPU

2 · Configure Kaggle API

from google.colab import files
files.upload()  # select your kaggle.json

import os
os.makedirs('/root/.kaggle', exist_ok=True)
!cp kaggle.json /root/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

3 · Run cells in order

Cell Action
Cell 0 Install dependencies — tensorflow, gradio, gTTS, imutils …
Cell 1 Download & extract the Sinhala sign dataset from Kaggle
Cell 2–3 Fine-tune InceptionV3 — ~5–10 min on T4 GPU
Cell 4–5 Install gTTS · verify output files
Cell 6–9 Load models, hand detector, and Sinhala CSV mappings
Cell 8 Plot training accuracy & loss curves
Cell 10–12 Launch the Gradio web UI (generates a public shareable link)
Cell 13–14 (Optional) Test on a static uploaded image

4 · Translate a sign

Once the Gradio UI launches, Colab prints a public link. Open it, go to the Translator tab, allow camera access, hold up a hand sign, and click Capture & Translate.


🖥️ Gradio Interface

The UI has three tabs accessible from the shareable Colab link:

╔══════════════════════════════════════════════════════════════════╗
║  🤟  Sinhala Sign Language Translator                           ║
╠══════════════════╦═══════════════════╦═══════════════════════════╣
║   Translator     ║  Training Graphs  ║     Dataset Info          ║
╠══════════════════╩═══════════════════╩═══════════════════════════╣
║                                                                  ║
║  [ Webcam / Upload ]         [ 🔍 Detected hand ]               ║
║                                                                  ║
║  [ Capture & Translate ▶ ]   Sign (EN)   |   සිංහල text         ║
║                              Phonetic pronunciation              ║
║                              ████████░░  Confidence: 91%        ║
║                              🔊 Speech output (.mp3)            ║
║                                                                  ║
║  ▾ Supported signs  one · two · three · what · when …           ║
╠══════════════════════════════════════════════════════════════════╣
║  Training Graphs  →  Accuracy & loss curves · live reload        ║
║  Dataset Info     →  Per-class image count & distribution chart  ║
╚══════════════════════════════════════════════════════════════════╝

🤟 Supported Signs

Numbers

one two three four five six
seven eight nine ten eleven thirteen
fourteen twenty thirty fifty

Question Words

what when who why

Each class maps to its Sinhala script and phonetic romanisation via CSV lookup tables bundled with the project.


🔬 Model Architecture

Input Image
     │
     ▼
OpenCV DNN Hand Detector  (ResNet-10 SSD · Caffe · confidence ≥ 0.5)
     │
     │  crop → resize 299×299 → normalise [0, 1]
     ▼
InceptionV3 Feature Extractor  (ImageNet pretrained · TF Hub · frozen)
     │
     ▼
Retrained Classification Head  (Dense · Softmax · N classes)
     │
     ├──►  english-sinhala-script.csv  →  Sinhala script
     └──►  sinhala-textscript.csv      →  Phonetic · gTTS audio

🛠️ Tech Stack

Library Role
tensorflow / tf.compat.v1 InceptionV3 transfer learning, frozen graph inference
tensorflow-hub Pretrained InceptionV3 feature extractor
opencv-python Hand detection (DNN SSD), bounding boxes, image I/O
Pillow Image loading and preprocessing
gradio Interactive web UI — webcam, upload, tabs, audio player
gTTS Google Text-to-Speech audio generation
numpy Array ops and tensor manipulation
matplotlib Training curves and dataset distribution charts
kaggle Dataset download via API
imutils Image resize utilities

💡 Tips for Best Accuracy

  • 📸 Good lighting — even, shadow-free illumination improves detection
  • 🤚 Plain background — high contrast between hand and background helps
  • 🎯 Centre your hand — keep the sign clearly within frame
  • 🧘 Hold still — steady the sign for 1–2 seconds before capturing
  • 🔁 Retake if confidence < 50% — a slightly different angle often helps

🌏 Why Sinha-Sign?

Sri Lanka is home to an estimated ~50,000 Deaf and hard-of-hearing individuals. Despite a rich and distinct Sinhala Sign Language, digital tools for automatic recognition remain scarce. Sinha-Sign is a step toward making technology more accessible for the Sinhala-speaking Deaf community — bridging gesture to script and speech with open, reproducible deep learning.


📄 License

Released under the MIT License. The Sinhala Sign Language dataset is sourced from Kaggle and is subject to its original data licence.


Built with ❤️ for the Sinhala-speaking Deaf community  ·  Runs entirely on Google Colab

About

Real-time Sinhala (සිංහල) Sign Language translator using InceptionV3 transfer learning, OpenCV DNN hand detection, gTTS speech synthesis, and a live Gradio UI. Gesture in, Sinhala script out.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors