Skip to content

widgetwalker/image-caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ธ AI Image Caption Generator

Python PyTorch Transformers Streamlit OpenCV

AI-powered image caption generator using the BLIP model from Salesforce. Upload images or capture from webcam to generate descriptive captions with SQLite storage and history view.


โœจ Features

  • ๐Ÿค– BLIP Model - State-of-the-art image captioning using Salesforce BLIP
  • ๐Ÿ“ค Image Upload - Support for JPG, JPEG, and PNG formats
  • ๐Ÿ“ท Webcam Capture - Real-time image capture and captioning
  • ๐Ÿ’พ SQLite Database - Persistent storage of captions and images
  • ๐Ÿ“œ Caption History - View all previously generated captions
  • ๐ŸŽจ Streamlit UI - Clean, interactive web interface
  • โšก Fast Processing - Optimized with PyTorch and caching
  • ๐Ÿ”„ Auto-save - Automatically saves images and captions

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Webcam (optional, for webcam capture feature)

Installation

# Clone the repository
git clone https://github.com/widgetwalker/image-caption.git
cd image-caption

# Install dependencies
pip install -r requirements.txt

Run the Application

streamlit run app.py

The app will open in your browser at http://localhost:8501


๐Ÿ“ฆ Dependencies

streamlit>=1.28.0
torch>=2.0.0
transformers>=4.30.0
Pillow>=10.0.0
opencv-python>=4.8.0
numpy>=1.24.0

๐ŸŽฎ Usage

Upload Image Method

  1. Click on the "Upload Image" tab
  2. Choose an image file (JPG, JPEG, or PNG)
  3. Click "Generate Caption"
  4. View the AI-generated caption

Webcam Capture Method

  1. Click on the "Webcam Capture" tab
  2. Click "Open Webcam" to capture an image
  3. Click "Generate Caption for Webcam Image"
  4. View the AI-generated caption

View History

  • Scroll down to see Caption History
  • Click on any entry to view the image and caption
  • History is stored in caption_database.db

๐Ÿ—๏ธ Project Structure

image-caption/
โ”œโ”€โ”€ app.py                    # Main Streamlit application
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ caption_database.db       # SQLite database (auto-created)
โ”œโ”€โ”€ images/                   # Saved images directory (auto-created)
โ””โ”€โ”€ README.md                 # This file

๐Ÿง  How It Works

BLIP Model

The application uses BLIP (Bootstrapping Language-Image Pre-training) from Salesforce:

  • Model: Salesforce/blip-image-captioning-base
  • Framework: Hugging Face Transformers
  • Backend: PyTorch

Processing Pipeline

  1. Image Input - Upload or webcam capture
  2. Preprocessing - Convert to PIL Image format
  3. Model Inference - BLIP generates caption
  4. Storage - Save to SQLite database
  5. Display - Show caption and update history

๐Ÿ’พ Database Schema

CREATE TABLE captions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    image_path TEXT,
    caption TEXT,
    created_at TIMESTAMP
);

๐ŸŽจ UI Features

  • Tabbed Interface - Separate tabs for upload and webcam
  • Image Preview - View images before captioning
  • Expandable History - Collapsible caption entries
  • Responsive Design - Works on different screen sizes
  • Loading States - Visual feedback during processing

โš™๏ธ Configuration

Model Selection

To use a different BLIP model, modify app.py:

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

Database Location

Change the database path in app.py:

conn = sqlite3.connect('your_custom_path.db')

๐Ÿ› Troubleshooting

Webcam Not Working

  • Ensure webcam permissions are granted
  • Check if another application is using the webcam
  • Try restarting the Streamlit app

Model Loading Issues

# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/

# Reinstall transformers
pip install --upgrade transformers

Database Errors

# Delete and recreate database
rm caption_database.db
# Restart the app to auto-create

๐Ÿ“Š Performance

  • First Load: ~5-10 seconds (model download)
  • Subsequent Loads: Instant (cached)
  • Caption Generation: ~1-2 seconds per image
  • Webcam Capture: Real-time

๐Ÿค Contributing

Contributions are welcome! Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“œ License

This project is licensed under the MIT License - free to use and modify!


๐Ÿ”— Links


๐Ÿ“ง Contact

Dheeraj Pilli


โญ Star this repo if you find it helpful!

Built with โค๏ธ using BLIP, PyTorch, and Streamlit

About

AI-powered Image Caption Generator using BLIP model. Upload images or capture via webcam to generate captions. Stores captions in SQLite DB with history view. Built with Streamlit, PyTorch, and OpenCV. Easy setup with Python. MIT License.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages