This project implements a Visual Question Answering (VQA) pipeline using two model architectures:
- 🔹 LSTM + CNN
- 🔸 Attention
The dataset used for this project is the COCO-QA dataset.
python -m venv vqa_env
source vqa_env/bin/activatepip install -r requirements.txtpython data/cocoqa_preprocess.pypython data/preprocessing.pyRun notebook: data/processing.ipynb on Kaggle/Colab to use GPU for faster processing.
python train_lstm.py --batch_size 16 --max_epochs 1000python train_attention.py --batch_size 16 --max_epochs 1000You can use notebook vqa_main.ipynb for end-to-end training and evaluation.
- [1] VQA: Visual Question Answering (Agrawal et al, 2016): https://arxiv.org/pdf/1505.00468v6.pdf
- [2] Hierarchical Question-Image Co-Attention for Visual Question Answering (Lu et al, 2017): https://arxiv.org/pdf/1606.00061.pdf

