Image Captioning with CLIP + Qwen 3 & Custom Transformer Decoders

This repo contains code for training and evaluating image captioning models that integrate CLIP's vision encoder with two distinct decoder strategies:

A Qwen 3 (0.6B) language model augmented with vision input.
A custom Transformer decoder trained from scratch.

Both models are trained on the Flickr30k dataset to generate natural language descriptions for images.

Structure

.
├── inference_single_img_qwen.py         # Inference with Qwen 3 decoder on a single image
├── inference_single_img_custom.py       # Inference with custom decoder on a single image
├── inference_flickrtest_qwen.py         # Evaluate Qwen model on Flickr30k test split
├── image_caption_qwen.py                # Training script for Qwen decoder
├── image_caption_custom.py              # Training script for custom transformer decoder
├── utils.py                             # Helper functions (e.g., Flickr30k loader)
├── flickr_dataset/                      # Locally saved train/val/test splits (via `load_from_disk`)
└── qwen3model/                          # Directory containing the base Qwen model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning with CLIP + Qwen 3 & Custom Transformer Decoders

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
dataloader.py		dataloader.py
image_caption_custom.py		image_caption_custom.py
image_caption_qwen.py		image_caption_qwen.py
inference_flickrtest_qwen.py		inference_flickrtest_qwen.py
inference_single_img_custom.py		inference_single_img_custom.py
inference_single_img_qwen.py		inference_single_img_qwen.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Image Captioning with CLIP + Qwen 3 & Custom Transformer Decoders

Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages