Don't just say it. Show it.
Gestura bridges the gap between patient pain and clinical understanding using real-time Computer Vision, 3D visualization, and Generative AI.
Telehealth and traditional medical intake systems suffer from a critical communication gap, especially for vulnerable populations:
- Language barriers prevent accurate symptom descriptions
- Anatomical ambiguity ("My arm hurts") lacks clinical precision
- Loss of physical interaction in remote care removes intuitive pointing and localization
- Clinical burden forces doctors to spend time documenting instead of diagnosing
These challenges disproportionately affect: - Non-native speakers - Elderly and pediatric patients - People with speech, hearing, or cognitive impairments - Anxious patients in high-stress medical settings
Gestura is a gesture-based medical screening platform that allows patients to communicate pain non-verbally.
Patients simply stand in front of a camera, point to areas of concern on their own body, and use natural gestures to confirm selections. Gestura translates these gestures into structured, clinically meaningful data, enhanced with AI-generated medical summaries.
-
Non-Verbal Input
The patient stands in front of a webcam. MediaPipe tracks full-body and hand landmarks in real time. -
Gesture-to-Anatomy Mapping
Pointing gestures are mapped to anatomical regions on a 3D digital twin. -
Confirmation via Pinch Gesture
A pinch gesture locks and saves a body part. Multiple areas can be saved in one session. -
Multilingual Guidance
Voice prompts (ElevenLabs) guide the patient through the process in their native language. -
AI Medical Synthesis
Saved data is processed by Google Gemini to generate structured, clinician-ready reports. -
Clinical Handoff
Doctors receive a clear pain map and AI-generated summary, improving speed and clarity of care.
- Full-body pose tracking via MediaPipe Holistic
- High-precision fingertip tracking
- Scale-aware pinch detection (works at varying distances)
- Each body part can only be saved once (no duplicates)
- Color-coded pointer states:
- Blue: pointing
- Yellow: locking in progress
- Red: saved / confirmed
- Hover highlights on anatomical regions
- "Saved" indicators when revisiting selected areas
- Save multiple body parts per session
- Undo last saved part
- Clear all saved parts
- Persistent visual confirmation
- Body parts map to a neutral 3D human model
- Provides spatial clarity for clinicians
- Designed for future annotation and heatmap overlays
- Powered by ElevenLabs
- Supports English, French, Spanish, Mandarin, and Japanese
- Improves accessibility for illiterate or visually impaired users
- Powered by Google Gemini
- Converts raw gesture data into:
- Structured clinical summaries
- SOAP-style notes
- Non-diagnostic triage insights
- Supports patient-language → doctor-language translation
- Python + Flask
- Threaded computer vision processing
- REST APIs for CV state, UI, and AI services
- MediaPipe Holistic
- Pose landmarks → body region mapping
- Hand landmarks → pointer & pinch detection
- Custom geometric regions for:
- Limbs
- Torso
- Head and neck
- Optimized for real-time performance (≈15--20 FPS)
- HTML5 + Tailwind CSS
- Live MJPEG video streaming
- Interactive gesture toolbar
- Three.js-powered 3D model rendering
- Google Gemini for report generation
- ElevenLabs for multilingual voice synthesis
- Session-based issue storage
- The base model (
male_lopoly.glb) is loaded using Three.js GLTFLoader - Original textures are overridden with a neutral
MeshPhysicalMaterial- Privacy-first design
- High contrast for pain indicators
- Anatomical nodes are represented as programmatically generated spheres
- Each node has predefined
(x, y, z)coordinates - Spheres:
- Glow blue when hovered
- Turn red when locked
- Change color based on pain severity in clinician view
- 2D hand landmark coordinates are projected into screen space
- Overlap with 3D node projections registers a "touch"
- Enables touchless interaction with medical data
- Backend: Python 3.10+, Flask
- Frontend: HTML5, Tailwind CSS, Three.js
- Computer Vision: MediaPipe, OpenCV
- AI: Google Gemini API
- Audio: ElevenLabs API
- Data Processing: NumPy
- Python 3.10+
- Webcam
- Google Gemini API key
- ElevenLabs API key
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
pip install -r requirements.txtCreate a .env file:
GEMINI_API_KEY=your_key
GEMINI_MODEL=gemini-1.5-flash
ELEVENLABS_API_KEY=your_key
DEBUG=Truepython app.pyAccess at: http://localhost:5050
HackHive2026/ ├── gestura_flask/ │ ├── app.py │ ├── templates/ │ ├── static/ | └── male_boning.glb │ └── male_lopoly.glb ├── cv_adrian/ │ ├── body/ │ ├── interaction/ │ ├── paint/ │ ├── UI/ │ └── vision/ ├── .env # create this ├── .env.example ├── requirements.txt └── README.md
Add screenshots or GIFs here.
- Temporal pain tracking
- Mobile device support
- EHR / FHIR integration
- PDF and EMR export
- Clinical usability testing
- AL Muqshith Shifan --- Frontend & Full Stack\
- Kevin Christopher Chua --- Frontend & 3D Visualization\
- Adrian Fudge --- Computer Vision & Backend\
- Alex --- AI Integration & Debugging
Built with ❤️ and ☕ for HackHive 2026. Applying AI to make healthcare more human.