NJIT GirlHacks 2025 Project
https://www.youtube.com/watch?v=I1YlobrLi8g&t=1s
A mobile app where blind or low-vision users can capture photos to understand their environment. The app will announce the information detected in the image, and it could be used for a generic analysis of the surrounding, or help users get textual information from materials like flyers.
- Capture a single image from the phone camera
- Send image to backend via MCP server
- Backend runs OCR via Gen AI → extracts descriptions
- Return extracted text in JSON
- App reads text aloud via TTS
- Camera: lets user take a photo
- Networking: sends photo to backend MCP server
- Speech: converts backend JSON output → audio
- Accessibility: announce new screen
Input: user’s photo
Output: spoken text
- MCP Portal: exposed to frontend as a standard tool
- Gemini API: converts image to text and output a high-level and a detailed description