Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
7edcb9e
Updated Gemini prompt to give short and concise scene descriptions
Jul 22, 2025
270adde
Updated Gemini prompt to give short and concise scene descriptions
Jul 22, 2025
e10d123
Merge pull request #7 from khanak0509/feature/concise-and-short-gemin…
kaushav07 Jul 23, 2025
cba8b98
Add configurable TTS engine support (gTTS/pyttsx3) with offline fallb…
kaurroopak Jul 25, 2025
a2439ad
Create LICENSE
SK8-infi Jul 29, 2025
304d840
Update README.md
SK8-infi Jul 29, 2025
14c0b8b
Add scan_logger utility for logging scans (#19)
krithika-pixel Aug 2, 2025
2cbc45a
Feature: Add face recognition and FastAPI integration
Zehen-249 Aug 4, 2025
d006b29
Update requirements.txt file
Zehen-249 Aug 4, 2025
e654fbc
Integrate LangChain with Gemini for image understanding
Aug 4, 2025
26a8dcd
Integrate LangChain with Gemini for image understanding
Aug 4, 2025
4784a24
feat: Implement speech-to-text module
vibhuti970 Aug 5, 2025
264672e
Create CODE_OF_CONDUCT.md
AnushkaChanda Aug 5, 2025
216b233
Merge pull request #27 from SK8-infi/main
shreeraksha2112 Aug 5, 2025
6ebd5c4
Add load_model feature for face_detection model
Zehen-249 Aug 5, 2025
c218c7e
added langchain
Aug 5, 2025
df59247
added langchain
Aug 5, 2025
8f8a333
Updated CODE_OF_CONDUCT.md
AnushkaChanda Aug 5, 2025
6435903
Create Updated Readme
Jai-76 Aug 5, 2025
9a314a1
Merge pull request #36 from AnushkaChanda/main
shreeraksha2112 Aug 6, 2025
8d8f512
Merge pull request #39 from vibhuti970/main
AMISHA2004-devgeek Aug 10, 2025
efe127a
Merge pull request #38 from Jai-76/patch-2
AMISHA2004-devgeek Aug 10, 2025
c2df2e9
Merge pull request #37 from khanak0509/feature/added-langchain
AMISHA2004-devgeek Aug 10, 2025
d3052fd
Merge pull request #34 from Zehen-249/face-recog
AMISHA2004-devgeek Aug 10, 2025
b9f045b
Revert "Feature: Add face recognition and FastAPI integration #24" (#43)
kaushav07 Aug 10, 2025
7957a8d
Revert "Create Updated Readme" (#44)
kaushav07 Aug 10, 2025
ca8b654
Revert "feat: Implement speech-to-text module" (#45)
kaushav07 Aug 10, 2025
caa4c26
enhance README.md file
payalpatel1208 Aug 10, 2025
d65d08b
updated requirements.txt
Aug 10, 2025
8a28f82
Refactor to include face_utils only (#48)
Zehen-249 Aug 10, 2025
d943c6c
Merge pull request #46 from khanak0509/feature/updated-requirements.txt
shreeraksha2112 Aug 12, 2025
25d1552
Merge branch 'main' into feature/langchain-for-Clean-modular-code
shreeraksha2112 Aug 13, 2025
a73fc2e
Merge pull request #33 from khanak0509/feature/langchain-for-Clean-mo…
shreeraksha2112 Aug 13, 2025
5cb2b39
Add mising insightface missing depency in requirements.txt #52
Zehen-249 Aug 13, 2025
1870875
Merge pull request #47 from payalpatel1208/update-readme
shreeraksha2112 Aug 14, 2025
a8fa898
Add app directory structure (#54) (#55)
Zehen-249 Aug 23, 2025
a0e9848
Merge pull request #53 from Zehen-249/add-missing-dependency
shreeraksha2112 Aug 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Virtual environment
.venv/
venv/
env/
ENV/
visionmate/

# PyInstaller
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Model downloads & cache
app/models/
*.onnx
*.pth
*.pb
*.tflite

# Temporary files
*.log
*.tmp
*.bak
*.swp
*.DS_Store
Thumbs.db

# Jupyter/IPython
.ipynb_checkpoints

# Environment variables
.env
.env.*

# VSCode/IDE
.vscode/
.idea/

# macOS
.DS_Store

# Windows
*.lnk
desktop.ini
88 changes: 88 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# 📜 Code of Conduct

## 👋 Welcome to VisionMate

**VisionMate** is an open-source initiative committed to empowering visually impaired individuals through inclusive, innovative assistive technology. We believe in creating a **respectful, collaborative, and safe environment** for everyone involved in this mission—regardless of background, identity, or skill level.

This Code of Conduct outlines the behavior expected of all contributors and participants in VisionMate spaces.

---

## 💡 Our Values

We expect all members of the VisionMate community to:

- 🤝 Treat others with **respect, kindness, and empathy**
- 🌍 Embrace **diversity and inclusion**
- 📣 Communicate **clearly and constructively**
- 🧠 Encourage **learning, sharing, and collaboration**
- 🎯 Focus on **problem-solving** and positive contributions

---

## 🗣️ Feedback Process

We welcome input from all contributors to help improve our community, processes, and codebase.

- 🛠 Share suggestions through GitHub discussions, issues, or pull requests
- 🧩 Be open to differing viewpoints and respectful debate
- ✅ Encourage reviews that are kind, specific, and constructive
- 📝 Feedback will be considered carefully by maintainers and incorporated when appropriate

---

## 🚫 Unacceptable Behavior

To ensure a supportive space, the following will **not** be tolerated:

- ❌ Harassment, discrimination, or hate speech
- ❌ Personal attacks, threats, or derogatory comments
- ❌ Sexualized or inappropriate content or language
- ❌ Spamming, trolling, or sustained disruption
- ❌ Sharing private information without explicit consent

---

## 🙋 Reporting Issues

If you witness or experience any behavior that violates this Code of Conduct:
**Report it immediately.**

All reports will be handled **discreetly and respectfully** by the project maintainers.

---

## ⚖️ Enforcement

Violations of this Code of Conduct may result in:

| Consequence | Description |
|-------------------|---------------------------------------------------------------------------|
| 🟢 Warning | A private warning and clarification of the issue |
| 🟡 Temporary Ban | Temporary removal from participation in discussions or contributions |
| 🔴 Permanent Ban | Full removal from the project and blocking of further contributions |

### 🧾 Accountability

- Repeated or severe violations may lead to stricter consequences, including permanent bans
- Maintainers reserve the right to evaluate each case on a situational basis
- Appeals may be discussed with the core maintainer team if needed

---

## 👥 Scope

This Code of Conduct applies to:

- All VisionMate GitHub repositories (issues, pull requests, discussions)
- Community communication platforms (e.g., chats, forums)
- Public or private conversations related to the project
- Any events, meetings, or collaborative spaces

---

## 📝 Attribution

This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/version/2/1/code_of_conduct.html), version 2.1.
Thank you for helping make **VisionMate** a safe, accessible, and inclusive space for everyone. 💙
Let’s build a world where technology supports **everyone’s independence.**
53 changes: 53 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# 👥 Contribution Guide

Welcome to the Hall of Fame! 🏆

Every line of code, every bug fix, every pixel of design — it all comes from people like **you**.

------------------------------------------------------------------------------------------
**VisionMate** is more than just an AI project — it’s a growing community of learners, builders, and innovators 💡.

🧠💻🎨🚀💬
From developers to designers to curious first-timers — we see you, we appreciate you, and we welcome you.

-------------------------------------------------------------------------------------------


## 🛠️ Contribution Areas

You can contribute to:

🎨Python – Core backend and computer vision
🧠OpenCV – Image processing and recognition
🗃️Flask / Django – Backend framework (to be finalized)
📊React.js / Flutter – Frontend or app interface
📊MySQL – Data storage
📋Google Cloud Vision API – (future integration)
📋Text-to-Speech / Speech-to-Text APIs – Accessibility tools


-------------------------------------------------------------------------------------------

## 🚀 Getting Started


Follow these steps to contribute to the VisionMate project on your local machine:

# 1. Fork the Repository
By clicking on the Fork button of the repository, you get access to commit changes and push them in github.


# 2. Clone the repository
git clone https://github.com/kaushav07/VisionMate.git

# 3. Navigate into the project directory
cd VisionMate

# 4. Install all the required dependencies
pip install -r requirements.txt




-------------------------------------------------------------------------------------------

21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 kaushav07

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
70 changes: 53 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,58 @@

## Project Overview

VisionMate aims to integrate advanced computer vision, text recognition, and speech technologies to help users:
- Identify objects and surroundings
- Read printed or handwritten text aloud
- Provide real-time feedback through a simple, user-friendly interface
VisionMate uses computer vision, text recognition, and speech tech to help users:

- Recognize objects and surroundings
- Read printed or handwritten text out loud
- Control the system easily with voice commands
- Get real-time alerts about obstacles
- Add new features easily in the future

This is the early development and ideation phase. The repository will include prototypes, research notes, and starter code as the project progresses.

## Features (Planned)

✅ Object detection using computer vision
✅ Text-to-speech functionality
✅ Speech-based user controls
✅ Environment awareness for obstacle detection
✅ Modular architecture for future feature integration
- ✅ Real-time object detection
- ✅ Text-to-speech to read out text
- ✅ Speech-based user controls
- ✅ Environment awareness for obstacle detection
- ✅ Modular architecture for future feature integration

## Technology Stack

| Part | Technology / Tools |
|---------------------|-------------------------------------------|
| Programming Language| Python |
| Computer Vision | OpenCV, Google Cloud Vision API (planned) |
| Backend Framework | Flask / Django (to be decided) |
| Frontend / App | React.js / Flutter (planned) |
| Database | MySQL |
| Accessibility APIs | Text-to-Speech / Speech-to-Text APIs |

## How It Works

Here’s how VisionMate works step-by-step:

1. **Captures Input:**
Uses a camera to take live pictures or videos of the surroundings.

2. **Detects Objects:**
Uses computer vision to find and identify things like doors, obstacles, signs, etc.

## Tech Stack
3. **Reads Text:**
Uses OCR (Optical Character Recognition) to detect printed or handwritten text.

4. **Speech Processing:**
- Converts detected text to speech so the user can hear it.
- Listens to user’s voice commands to control the system.

5. **Gives Feedback:**
Provides real-time audio alerts about obstacles and text info to help the user move safely.

6. **Modular Design:**
Built so new features and better AI can be added later easily.

- **Python** – Core backend and computer vision
- **OpenCV** – Image processing and recognition
- **Flask / Django** – Backend framework (to be finalized)
- **React.js / Flutter** – Frontend or app interface
- **MySQL** – Data storage
- **Google Cloud Vision API** – (future integration)
- **Text-to-Speech / Speech-to-Text APIs** – Accessibility tools

## Getting Started

Expand All @@ -39,3 +67,11 @@ Clone the repository and install dependencies:
git clone https://github.com/kaushav07/VisionMate.git
cd VisionMate
pip install -r requirements.txt
```
## Contributing

We’d love your help! Please see [CONTRIBUTING.md](CONTRIBUTING.md) to learn how you can contribute.

## 📄 License

This project is licensed under the [MIT License](LICENSE).
Binary file added __pycache__/config.cpython-313.pyc
Binary file not shown.
Binary file added __pycache__/tts_utils.cpython-313.pyc
Binary file not shown.
Empty file added app/api/__init__.py
Empty file.
Empty file.
Empty file added app/data/__init__.py
Empty file.
Binary file added app/data/users/dummy_user/pictures/Paul.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/data/users/dummy_user/pictures/Peter.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added app/services/__init__.py
Empty file.
4 changes: 4 additions & 0 deletions app/services/authentication/auth_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/authentication/auth_utils.py
"""
This file contains utility functions for authentication-related tasks.
"""
4 changes: 4 additions & 0 deletions app/services/authentication/password_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/authentication/password_utils.py
"""
This file contains utility functions for password-related tasks.
"""
Empty file.
4 changes: 4 additions & 0 deletions app/services/perception/audio/stt_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/audio/transcription.py
"""
This file is for Speech-to-text (STT) logic utilities.
"""
4 changes: 4 additions & 0 deletions app/services/perception/audio/tts_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/audio/transcription.py
"""
This file is for Text-to-speech (TTS) logic utilities.
"""
4 changes: 4 additions & 0 deletions app/services/perception/audio/voice_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/audio/voice_utils.py
"""
This file is for voice related utilities.
"""
Empty file.
4 changes: 4 additions & 0 deletions app/services/perception/vision/face_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/vision/face_utils.py
"""
This file contains all the utility services related to face.
"""
4 changes: 4 additions & 0 deletions app/services/perception/vision/gesture_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/vision/gesture_utils.py
"""
This file is for hand/pose getures like utilities.
"""
4 changes: 4 additions & 0 deletions app/services/perception/vision/object_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/vision/object_utils.py
"""
This file contains all the utility services related to objects.
"""
4 changes: 4 additions & 0 deletions app/services/perception/vision/scene_analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/perception/vision/scene_analysis.py
"""
This file is for scene analysis and description utilities.
"""
4 changes: 4 additions & 0 deletions app/services/storage/database_storage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/storage/database_storage.py
"""
This file is for utilities to Save & retrieve from database.
"""
4 changes: 4 additions & 0 deletions app/services/storage/file_storage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/services/storage/file_storage.py
"""
This file is for utilities to Save & retrieve files locally.
"""
Empty file added app/shared/__init__.py
Empty file.
4 changes: 4 additions & 0 deletions app/shared/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# app/shared/config.py
"""
This file contains configuration variables and methods for the VisionMate application.
"""
Loading