CS 224N Default Final Project: Build GPT-2

This is the default final project for the Stanford CS 224N class. Please refer to the project handout on the course website for detailed instructions and an overview of the codebase.

This project comprises two parts. In the first part, you will implement some important components of the GPT-2 model to better understand its architecture. In the second part, you will use the token embeddings produced by your GPT-2 model on two downstream tasks: paraphrase detection and sonnet generation. You will implement extensions to improve your model's performance on these tasks.

In broad strokes, Part 1 of this project targets:

modules/attention.py: Missing code blocks.
modules/gpt2_layer.py: Missing code blocks.
models/gpt2.py: Missing code blocks.
classifier.py: Missing code blocks.
optimizer.py: Missing code blocks.

To test Part 1, you will run:

optimizer_test.py: To test your implementation of optimizer.py.
sanity_check.py: To test your implementation of GPT models.
classifier.py : To perform sentiment classification using your models.

In Part 2 of this project, you will use GPT2 (via cloze-style classification) detect if one sentence is a paraphrase of another as well as generate sonnets via autoregressive language modeling.

To test Part 2, you will run:

paraphrase_detection.py: To perform paraphrase detection.
sonnet_generation.py: To perform sonnet generation.

Important: Adjust training hyperparameters, particularly batch size, according to your GPU's specifications to optimize performance and prevent out-of-memory errors.

Pre-testing instructions

While there are missing code blocks that you need to implement in both of these files, the main focus of this second part are the extensions: how you modify your GPT2 model to improve its ability to determine if one sentence is a paraphrase of another as well as its ability to generate sonnets.

Setup instructions

1. Create and activate the conda environment:

conda env create -f env.yml
conda activate cs224n_dfp

2. (Optional) Install Modal for cloud GPU training:

pip install modal
modal setup   # authenticates your account

Running paraphrase detection

Local training

python paraphrase_detection.py --use_gpu

Key arguments:

Argument	Default	Description
`--epochs`	10	Number of training epochs
`--lr`	1e-5	Learning rate
`--batch_size`	16	Batch size
`--model_size`	`gpt2`	One of `gpt2`, `gpt2-medium`, `gpt2-large`
`--use_loreft`	off	Use LoREFT parameter-efficient fine-tuning
`--loreft_rank`	4	Rank of the LoREFT intervention subspace
`--loreft_window_size`	1	Number of last tokens to apply LoREFT to

Example with LoREFT on gpt2-medium:

python paraphrase_detection.py --use_gpu \
  --model_size gpt2-medium \
  --epochs 15 \
  --lr 2e-4 \
  --batch_size 128 \
  --use_loreft \
  --loreft_rank 32 \
  --loreft_window_size 4

Cloud training via Modal (H100 GPU)

modal run modal_run.py

Override defaults from the command line:

modal run modal_run.py --epochs 20 --lr 2e-4 --use_loreft --loreft_rank 32

Outputs (predictions and logs) are saved to the paraphrase-checkpoints Modal volume. To retrieve them:

modal volume ls paraphrase-checkpoints
modal volume get paraphrase-checkpoints <remote-path> <local-path>

Running sonnet generation

Local training

python sonnet_generation.py --use_gpu

Key arguments:

Argument	Default	Description
`--epochs`	10	Number of training epochs
`--lr`	1e-5	Learning rate
`--batch_size`	8	Batch size
`--model_size`	`gpt2`	One of `gpt2`, `gpt2-medium`, `gpt2-large`, `gpt2-xl`
`--temperature`	1.2	Sampling temperature for generation
`--top_p`	0.9	Nucleus sampling cumulative probability
`--tuning_mode`	`full`	`full` for full fine-tuning or `loreft` for LoREFT
`--loreft_rank`	4	Rank of the LoREFT intervention subspace
`--loreft_dropout`	0.1	Dropout on LoREFT intervention output
`--loreft_window_size`	1	Number of last tokens to apply LoREFT to
`--log_curve`	off	Save per-epoch train loss and dev chrF to CSV

Example with LoREFT on gpt2-medium:

python sonnet_generation.py --use_gpu \
  --model_size gpt2-medium \
  --epochs 20 \
  --lr 1e-4 \
  --tuning_mode loreft \
  --loreft_rank 16 \
  --temperature 1.2 \
  --top_p 0.9 \
  --log_curve

Generated sonnets are written to predictions/generated_sonnets.txt. Dev evaluation uses chrF score, logged per epoch.

Acknowledgement

This project is adapted from a prior year's CS 224N project Implement BERT .

Parts of the code are from the transformers library (Apache License 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
doc		doc
models		models
modules		modules
predictions		predictions
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classifier.py		classifier.py
config.py		config.py
datasets.py		datasets.py
datasets_test.py		datasets_test.py
env.yml		env.yml
evaluation.py		evaluation.py
modal_run.py		modal_run.py
optimizer.py		optimizer.py
optimizer_test.npy		optimizer_test.npy
optimizer_test.py		optimizer_test.py
paraphrase_detection.py		paraphrase_detection.py
prepare_submit.py		prepare_submit.py
sanity_check.py		sanity_check.py
setup.sh		setup.sh
sonnet_generation.py		sonnet_generation.py
sonnet_plot_curve.py		sonnet_plot_curve.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 224N Default Final Project: Build GPT-2

Pre-testing instructions

Setup instructions

Running paraphrase detection

Local training

Cloud training via Modal (H100 GPU)

Running sonnet generation

Local training

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CS 224N Default Final Project: Build GPT-2

Pre-testing instructions

Setup instructions

Running paraphrase detection

Local training

Cloud training via Modal (H100 GPU)

Running sonnet generation

Local training

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages