📸 AI Image Caption Generator

AI-powered image caption generator using the BLIP model from Salesforce. Upload images or capture from webcam to generate descriptive captions with SQLite storage and history view.

✨ Features

🤖 BLIP Model - State-of-the-art image captioning using Salesforce BLIP
📤 Image Upload - Support for JPG, JPEG, and PNG formats
📷 Webcam Capture - Real-time image capture and captioning
💾 SQLite Database - Persistent storage of captions and images
📜 Caption History - View all previously generated captions
🎨 Streamlit UI - Clean, interactive web interface
⚡ Fast Processing - Optimized with PyTorch and caching
🔄 Auto-save - Automatically saves images and captions

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Webcam (optional, for webcam capture feature)

Installation

# Clone the repository
git clone https://github.com/widgetwalker/image-caption.git
cd image-caption

# Install dependencies
pip install -r requirements.txt

Run the Application

streamlit run app.py

The app will open in your browser at http://localhost:8501

📦 Dependencies

streamlit>=1.28.0
torch>=2.0.0
transformers>=4.30.0
Pillow>=10.0.0
opencv-python>=4.8.0
numpy>=1.24.0

🎮 Usage

Upload Image Method

Click on the "Upload Image" tab
Choose an image file (JPG, JPEG, or PNG)
Click "Generate Caption"
View the AI-generated caption

Webcam Capture Method

Click on the "Webcam Capture" tab
Click "Open Webcam" to capture an image
Click "Generate Caption for Webcam Image"
View the AI-generated caption

View History

Scroll down to see Caption History
Click on any entry to view the image and caption
History is stored in caption_database.db

🏗️ Project Structure

image-caption/
├── app.py                    # Main Streamlit application
├── requirements.txt          # Python dependencies
├── caption_database.db       # SQLite database (auto-created)
├── images/                   # Saved images directory (auto-created)
└── README.md                 # This file

🧠 How It Works

BLIP Model

The application uses BLIP (Bootstrapping Language-Image Pre-training) from Salesforce:

Model: Salesforce/blip-image-captioning-base
Framework: Hugging Face Transformers
Backend: PyTorch

Processing Pipeline

Image Input - Upload or webcam capture
Preprocessing - Convert to PIL Image format
Model Inference - BLIP generates caption
Storage - Save to SQLite database
Display - Show caption and update history

💾 Database Schema

CREATE TABLE captions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    image_path TEXT,
    caption TEXT,
    created_at TIMESTAMP
);

🎨 UI Features

Tabbed Interface - Separate tabs for upload and webcam
Image Preview - View images before captioning
Expandable History - Collapsible caption entries
Responsive Design - Works on different screen sizes
Loading States - Visual feedback during processing

⚙️ Configuration

Model Selection

To use a different BLIP model, modify app.py:

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

Database Location

Change the database path in app.py:

conn = sqlite3.connect('your_custom_path.db')

🐛 Troubleshooting

Webcam Not Working

Ensure webcam permissions are granted
Check if another application is using the webcam
Try restarting the Streamlit app

Model Loading Issues

# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/

# Reinstall transformers
pip install --upgrade transformers

Database Errors

# Delete and recreate database
rm caption_database.db
# Restart the app to auto-create

📊 Performance

First Load: ~5-10 seconds (model download)
Subsequent Loads: Instant (cached)
Caption Generation: ~1-2 seconds per image
Webcam Capture: Real-time

🤝 Contributing

Contributions are welcome! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

📜 License

This project is licensed under the MIT License - free to use and modify!

🔗 Links

Repository: github.com/widgetwalker/image-caption
BLIP Model: Salesforce BLIP
Streamlit: streamlit.io
Author: @widgetwalker

📧 Contact

Dheeraj Pilli

GitHub: @widgetwalker
Email: dheeraj5765483@gmail.com

⭐ Star this repo if you find it helpful!

Built with ❤️ using BLIP, PyTorch, and Streamlit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📸 AI Image Caption Generator

✨ Features

🚀 Quick Start

Prerequisites

Installation

Run the Application

📦 Dependencies

🎮 Usage

Upload Image Method

Webcam Capture Method

View History

🏗️ Project Structure

🧠 How It Works

BLIP Model

Processing Pipeline

💾 Database Schema

🎨 UI Features

⚙️ Configuration

Model Selection

Database Location

🐛 Troubleshooting

Webcam Not Working

Model Loading Issues

Database Errors

📊 Performance

🤝 Contributing

📜 License

🔗 Links

📧 Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📸 AI Image Caption Generator

✨ Features

🚀 Quick Start

Prerequisites

Installation

Run the Application

📦 Dependencies

🎮 Usage

Upload Image Method

Webcam Capture Method

View History

🏗️ Project Structure

🧠 How It Works

BLIP Model

Processing Pipeline

💾 Database Schema

🎨 UI Features

⚙️ Configuration

Model Selection

Database Location

🐛 Troubleshooting

Webcam Not Working

Model Loading Issues

Database Errors

📊 Performance

🤝 Contributing

📜 License

🔗 Links

📧 Contact