AI-powered image caption generator using the BLIP model from Salesforce. Upload images or capture from webcam to generate descriptive captions with SQLite storage and history view.
- ๐ค BLIP Model - State-of-the-art image captioning using Salesforce BLIP
- ๐ค Image Upload - Support for JPG, JPEG, and PNG formats
- ๐ท Webcam Capture - Real-time image capture and captioning
- ๐พ SQLite Database - Persistent storage of captions and images
- ๐ Caption History - View all previously generated captions
- ๐จ Streamlit UI - Clean, interactive web interface
- โก Fast Processing - Optimized with PyTorch and caching
- ๐ Auto-save - Automatically saves images and captions
- Python 3.8 or higher
- Webcam (optional, for webcam capture feature)
# Clone the repository
git clone https://github.com/widgetwalker/image-caption.git
cd image-caption
# Install dependencies
pip install -r requirements.txtstreamlit run app.pyThe app will open in your browser at http://localhost:8501
streamlit>=1.28.0
torch>=2.0.0
transformers>=4.30.0
Pillow>=10.0.0
opencv-python>=4.8.0
numpy>=1.24.0
- Click on the "Upload Image" tab
- Choose an image file (JPG, JPEG, or PNG)
- Click "Generate Caption"
- View the AI-generated caption
- Click on the "Webcam Capture" tab
- Click "Open Webcam" to capture an image
- Click "Generate Caption for Webcam Image"
- View the AI-generated caption
- Scroll down to see Caption History
- Click on any entry to view the image and caption
- History is stored in
caption_database.db
image-caption/
โโโ app.py # Main Streamlit application
โโโ requirements.txt # Python dependencies
โโโ caption_database.db # SQLite database (auto-created)
โโโ images/ # Saved images directory (auto-created)
โโโ README.md # This file
The application uses BLIP (Bootstrapping Language-Image Pre-training) from Salesforce:
- Model:
Salesforce/blip-image-captioning-base - Framework: Hugging Face Transformers
- Backend: PyTorch
- Image Input - Upload or webcam capture
- Preprocessing - Convert to PIL Image format
- Model Inference - BLIP generates caption
- Storage - Save to SQLite database
- Display - Show caption and update history
CREATE TABLE captions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
image_path TEXT,
caption TEXT,
created_at TIMESTAMP
);- Tabbed Interface - Separate tabs for upload and webcam
- Image Preview - View images before captioning
- Expandable History - Collapsible caption entries
- Responsive Design - Works on different screen sizes
- Loading States - Visual feedback during processing
To use a different BLIP model, modify app.py:
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")Change the database path in app.py:
conn = sqlite3.connect('your_custom_path.db')- Ensure webcam permissions are granted
- Check if another application is using the webcam
- Try restarting the Streamlit app
# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/
# Reinstall transformers
pip install --upgrade transformers# Delete and recreate database
rm caption_database.db
# Restart the app to auto-create- First Load: ~5-10 seconds (model download)
- Subsequent Loads: Instant (cached)
- Caption Generation: ~1-2 seconds per image
- Webcam Capture: Real-time
Contributions are welcome! Here's how:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - free to use and modify!
- Repository: github.com/widgetwalker/image-caption
- BLIP Model: Salesforce BLIP
- Streamlit: streamlit.io
- Author: @widgetwalker
Dheeraj Pilli
- GitHub: @widgetwalker
- Email: dheeraj5765483@gmail.com
โญ Star this repo if you find it helpful!
Built with โค๏ธ using BLIP, PyTorch, and Streamlit