Skip to content

Latest commit

 

History

History
276 lines (197 loc) · 7.61 KB

File metadata and controls

276 lines (197 loc) · 7.61 KB

Gestura -- AI-Powered Gesture-Based Medical Screening

HackHive 2026 · Track: Applied AI

Don't just say it. Show it.
Gestura bridges the gap between patient pain and clinical understanding using real-time Computer Vision, 3D visualization, and Generative AI.


🩺 The Problem

Telehealth and traditional medical intake systems suffer from a critical communication gap, especially for vulnerable populations:

  • Language barriers prevent accurate symptom descriptions
  • Anatomical ambiguity ("My arm hurts") lacks clinical precision
  • Loss of physical interaction in remote care removes intuitive pointing and localization
  • Clinical burden forces doctors to spend time documenting instead of diagnosing

These challenges disproportionately affect: - Non-native speakers - Elderly and pediatric patients - People with speech, hearing, or cognitive impairments - Anxious patients in high-stress medical settings


💡 The Solution

Gestura is a gesture-based medical screening platform that allows patients to communicate pain non-verbally.

Patients simply stand in front of a camera, point to areas of concern on their own body, and use natural gestures to confirm selections. Gestura translates these gestures into structured, clinically meaningful data, enhanced with AI-generated medical summaries.


🔄 User Workflow

  1. Non-Verbal Input
    The patient stands in front of a webcam. MediaPipe tracks full-body and hand landmarks in real time.

  2. Gesture-to-Anatomy Mapping
    Pointing gestures are mapped to anatomical regions on a 3D digital twin.

  3. Confirmation via Pinch Gesture
    A pinch gesture locks and saves a body part. Multiple areas can be saved in one session.

  4. Multilingual Guidance
    Voice prompts (ElevenLabs) guide the patient through the process in their native language.

  5. AI Medical Synthesis
    Saved data is processed by Google Gemini to generate structured, clinician-ready reports.

  6. Clinical Handoff
    Doctors receive a clear pain map and AI-generated summary, improving speed and clarity of care.


🧠 Key Features

🎯 Gesture-Based Body Part Selection

  • Full-body pose tracking via MediaPipe Holistic
  • High-precision fingertip tracking
  • Scale-aware pinch detection (works at varying distances)
  • Each body part can only be saved once (no duplicates)

🧍 Real-Time Visual Feedback

  • Color-coded pointer states:
    • Blue: pointing
    • Yellow: locking in progress
    • Red: saved / confirmed
  • Hover highlights on anatomical regions
  • "Saved" indicators when revisiting selected areas

🧩 Multi-Part Injury Tracking

  • Save multiple body parts per session
  • Undo last saved part
  • Clear all saved parts
  • Persistent visual confirmation

🧍‍♂️ 3D Digital Twin Visualization

  • Body parts map to a neutral 3D human model
  • Provides spatial clarity for clinicians
  • Designed for future annotation and heatmap overlays

🗣️ Multilingual Audio Guidance

  • Powered by ElevenLabs
  • Supports English, French, Spanish, Mandarin, and Japanese
  • Improves accessibility for illiterate or visually impaired users

🤖 AI-Generated Medical Reports

  • Powered by Google Gemini
  • Converts raw gesture data into:
    • Structured clinical summaries
    • SOAP-style notes
    • Non-diagnostic triage insights
  • Supports patient-language → doctor-language translation

🏗️ Technical Architecture

Backend

  • Python + Flask
  • Threaded computer vision processing
  • REST APIs for CV state, UI, and AI services

Computer Vision

  • MediaPipe Holistic
    • Pose landmarks → body region mapping
    • Hand landmarks → pointer & pinch detection
  • Custom geometric regions for:
    • Limbs
    • Torso
    • Head and neck
  • Optimized for real-time performance (≈15--20 FPS)

Frontend

  • HTML5 + Tailwind CSS
  • Live MJPEG video streaming
  • Interactive gesture toolbar
  • Three.js-powered 3D model rendering

AI & Audio

  • Google Gemini for report generation
  • ElevenLabs for multilingual voice synthesis
  • Session-based issue storage

🧍‍♂️ 3D Digital Twin: Technical Breakdown

  • The base model (male_lopoly.glb) is loaded using Three.js GLTFLoader
  • Original textures are overridden with a neutral MeshPhysicalMaterial
    • Privacy-first design
    • High contrast for pain indicators

Interaction Layer

  • Anatomical nodes are represented as programmatically generated spheres
  • Each node has predefined (x, y, z) coordinates
  • Spheres:
    • Glow blue when hovered
    • Turn red when locked
    • Change color based on pain severity in clinician view

CV ↔ 3D Mapping

  • 2D hand landmark coordinates are projected into screen space
  • Overlap with 3D node projections registers a "touch"
  • Enables touchless interaction with medical data

🛠 Tech Stack

  • Backend: Python 3.10+, Flask
  • Frontend: HTML5, Tailwind CSS, Three.js
  • Computer Vision: MediaPipe, OpenCV
  • AI: Google Gemini API
  • Audio: ElevenLabs API
  • Data Processing: NumPy

🚀 Setup & Installation

Prerequisites

  • Python 3.10+
  • Webcam
  • Google Gemini API key
  • ElevenLabs API key

Installation

python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate  # Windows

pip install -r requirements.txt

Environment Configuration

Create a .env file:

GEMINI_API_KEY=your_key
GEMINI_MODEL=gemini-1.5-flash
ELEVENLABS_API_KEY=your_key
DEBUG=True

Run

python app.py

Access at: http://localhost:5050


HackHive2026/ ├── gestura_flask/ │ ├── app.py │ ├── templates/ │ ├── static/ | └── male_boning.glb │ └── male_lopoly.glb ├── cv_adrian/ │ ├── body/ │ ├── interaction/ │ ├── paint/ │ ├── UI/ │ └── vision/ ├── .env # create this ├── .env.example ├── requirements.txt └── README.md


🖼️ Screenshots & Demo

Add screenshots or GIFs here.


🚀 Future Improvements

  • Temporal pain tracking
  • Mobile device support
  • EHR / FHIR integration
  • PDF and EMR export
  • Clinical usability testing

👥 Team -- HackHive 2026

  • AL Muqshith Shifan --- Frontend & Full Stack\
  • Kevin Christopher Chua --- Frontend & 3D Visualization\
  • Adrian Fudge --- Computer Vision & Backend\
  • Alex --- AI Integration & Debugging

Built with ❤️ and ☕ for HackHive 2026. Applying AI to make healthcare more human.