Skip to content

Latest commit

 

History

History
232 lines (166 loc) · 5.71 KB

File metadata and controls

232 lines (166 loc) · 5.71 KB

📸 AI Image Caption Generator

Python PyTorch Transformers Streamlit OpenCV

AI-powered image caption generator using the BLIP model from Salesforce. Upload images or capture from webcam to generate descriptive captions with SQLite storage and history view.


✨ Features

  • 🤖 BLIP Model - State-of-the-art image captioning using Salesforce BLIP
  • 📤 Image Upload - Support for JPG, JPEG, and PNG formats
  • 📷 Webcam Capture - Real-time image capture and captioning
  • 💾 SQLite Database - Persistent storage of captions and images
  • 📜 Caption History - View all previously generated captions
  • 🎨 Streamlit UI - Clean, interactive web interface
  • Fast Processing - Optimized with PyTorch and caching
  • 🔄 Auto-save - Automatically saves images and captions

🚀 Quick Start

Prerequisites

  • Python 3.8 or higher
  • Webcam (optional, for webcam capture feature)

Installation

# Clone the repository
git clone https://github.com/widgetwalker/image-caption.git
cd image-caption

# Install dependencies
pip install -r requirements.txt

Run the Application

streamlit run app.py

The app will open in your browser at http://localhost:8501


📦 Dependencies

streamlit>=1.28.0
torch>=2.0.0
transformers>=4.30.0
Pillow>=10.0.0
opencv-python>=4.8.0
numpy>=1.24.0

🎮 Usage

Upload Image Method

  1. Click on the "Upload Image" tab
  2. Choose an image file (JPG, JPEG, or PNG)
  3. Click "Generate Caption"
  4. View the AI-generated caption

Webcam Capture Method

  1. Click on the "Webcam Capture" tab
  2. Click "Open Webcam" to capture an image
  3. Click "Generate Caption for Webcam Image"
  4. View the AI-generated caption

View History

  • Scroll down to see Caption History
  • Click on any entry to view the image and caption
  • History is stored in caption_database.db

🏗️ Project Structure

image-caption/
├── app.py                    # Main Streamlit application
├── requirements.txt          # Python dependencies
├── caption_database.db       # SQLite database (auto-created)
├── images/                   # Saved images directory (auto-created)
└── README.md                 # This file

🧠 How It Works

BLIP Model

The application uses BLIP (Bootstrapping Language-Image Pre-training) from Salesforce:

  • Model: Salesforce/blip-image-captioning-base
  • Framework: Hugging Face Transformers
  • Backend: PyTorch

Processing Pipeline

  1. Image Input - Upload or webcam capture
  2. Preprocessing - Convert to PIL Image format
  3. Model Inference - BLIP generates caption
  4. Storage - Save to SQLite database
  5. Display - Show caption and update history

💾 Database Schema

CREATE TABLE captions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    image_path TEXT,
    caption TEXT,
    created_at TIMESTAMP
);

🎨 UI Features

  • Tabbed Interface - Separate tabs for upload and webcam
  • Image Preview - View images before captioning
  • Expandable History - Collapsible caption entries
  • Responsive Design - Works on different screen sizes
  • Loading States - Visual feedback during processing

⚙️ Configuration

Model Selection

To use a different BLIP model, modify app.py:

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

Database Location

Change the database path in app.py:

conn = sqlite3.connect('your_custom_path.db')

🐛 Troubleshooting

Webcam Not Working

  • Ensure webcam permissions are granted
  • Check if another application is using the webcam
  • Try restarting the Streamlit app

Model Loading Issues

# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/

# Reinstall transformers
pip install --upgrade transformers

Database Errors

# Delete and recreate database
rm caption_database.db
# Restart the app to auto-create

📊 Performance

  • First Load: ~5-10 seconds (model download)
  • Subsequent Loads: Instant (cached)
  • Caption Generation: ~1-2 seconds per image
  • Webcam Capture: Real-time

🤝 Contributing

Contributions are welcome! Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📜 License

This project is licensed under the MIT License - free to use and modify!


🔗 Links


📧 Contact

Dheeraj Pilli


⭐ Star this repo if you find it helpful!

Built with ❤️ using BLIP, PyTorch, and Streamlit