Image Captioning

This project implements an image captioning system using deep learning techniques. The system generates descriptive captions for images by combining a convolutional neural network (CNN) or transformer-based encoder with a recurrent neural network (RNN) or transformer-based decoder.

Disclaimer: This project was developed as part of a final assignment. Some features may be incomplete, and the code could benefit from further refinement.

Features

Support for multiple encoder-decoder architectures.
Dataset handling, including splitting, saving, and logging.
Integration with Weights & Biases (wandb) for experiment tracking.
Support for custom tokenization and vocabulary generation.

Directory Structure

├── models/
│   ├── encoders/
│   │   ├── base.py            # Base encoder class, all encoders inherit from
│   │   ├── basic.py           # Resnet50 based encoder for LSTM decoder
│   │   ├── intermediate.py    # Resnet50 based encoder for LSTM decoder
│   │   ├── transformer.py     # Resnet50 based encoder for transformer decoder
│   │   ├── swin.py            # Swin Transformer encoder for transformer decoder
│   ├── basic.py            # Basic encoder-LSTM decoder model
│   ├── intermediate.py     # Intermediate encoder-LSTM decoder model
│   ├── transformer.py      # Transformer-based model
│   ├── image_captioner.py  # Main class for image captioning model, Basic and Intermediate inherit from this. Has inference methods
├── runner/
│   ├── runner.py           # Main class for running the image captioning pipeline
│   ├── config.py           # Run configuration and hyperparameters
├── sweeper/
│   ├── sweeper.py          # Class for hyperparameter sweeping, inherits from Runner
│   ├── config.py           # Configuration for hyperparameter sweeping
├── datasets/
│   ├── dataset.py          # Dataset handling and preprocessing
│   ├── dataloader.py       # DataLoader for batching and shuffling
│   ├── vocabulary.py       # Vocabulary generation and tokenization
├── captioner.py   # Common interface for generating captions using different algorithms and models.
├── metrics.py     # Evaluation metrics for image captioning
├── runner_cli.py  # Command-line interface for running the image captioning pipeline
├── scheduler.py   # wrapper class for a learning rate scheduler
├── sweep.py       # Initializes wandb and runs the sweeper
├── test.py        # Evaluation script for testing the model
└── train.py       # Training script for the model

Dataset

Any dataset can be used as long as it's presented in a DataFrame with image file paths and captions columns.

Run the Project

Modify the runner/config.py file to set the up the run configuration and hyperparameters. The configuration file contains parameters for the model, dataset, training, and evaluation.

To train the model, use the CLI:

# Example command to train and test the model
python runner_cli.py --use-wandb --train --test

TODO feature: Add support for loading a config json file for the CLI.

Gen Captions and Attention Maps

You can use the CLI at plotter/caption.py to generate captions and/or attention maps for a given image.

# For generating captions without attention maps
python plotter/caption.py --img_pth <path_to_image> --checkpoint_pth <path_to_checkpoint> --no-attn --save-name <output_filename> --save-dir <output_directory>

# For generating captions with attention maps
python plotter/caption.py --img_pth <path_to_image> --checkpoint_pth <path_to_checkpoint> --save-name <output_filename> --save-dir <output_directory>

More Info

In report/ you can find a pdf with an indepth analysis of the project (in Spanish), including the architectures, training process, and results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Features

Directory Structure

Dataset

Run the Project

Gen Captions and Attention Maps

More Info

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 385 Commits
config		config
dataset		dataset
helpers		helpers
models		models
plotter		plotter
report		report
runner		runner
sweeper		sweeper
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
captioner.py		captioner.py
constants.py		constants.py
metrics.py		metrics.py
requirements.txt		requirements.txt
runner_cli.py		runner_cli.py
scheduler.py		scheduler.py
sweep.py		sweep.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Features

Directory Structure

Dataset

Run the Project

Gen Captions and Attention Maps

More Info

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages