Repository for our CLMS LING 573 group project. Evaluated on the BillSum corpus, our system:
- takes in a plaintext legislative bill document
- segments it into semantically coherent chunks
- applies neural syntactic simplification to each chunk
- generates a summary of the document with improved readability metrics than human-written summaries and SOTA baseline models.
- Clone repository if it does not already exist
git clone git@github.com:AnanthaR20/ling573.git-
Download miniconda for your OS
-
Create a new environment:
conda create -n 573-env- Activate the environment to start developing! Yay!
conda activate 573-env- Install all the required packages:
pip install -r requirements.txt- Download the optimized spaCy English language model for evaluation
python -m spacy download en_core_web_smNote that Conda environments have a slightly different setup if you run the wugwATSS system on Hyak. Official documentation can be found here
Virtual environment invocation is managed by our controller shell
script, generate_run.sh by specifying the correct config file,
patas.config or hyak.config. To explain each parameter provided in
the config:
# Specify the platform that is running the augmented system; the only valid options are "patas" or "hyak"
PLATFORM="patas"
# any Huggingface Seq2Seq model for summarization can be used here
CHECKPOINT="google/pegasus-billsum"
# For Deliverable 3, we only need to generate model predictions on the preprocessed data
MODE="predict"
# Filepath relative to the root folder of this repository
TESTFILE="preprocess/data/billsum_clean_test_se3-t5-512-512.csv"
# For Deliverable 3, we only need to reconstruct full summaries after summarizing chunks
CONCAT="post"
# Patas and Hyak can handle different batch sizes due to memory constraints
BATCH_SIZE=4Note that the test data is directly preloaded in the preprocess directory of this repository so no extra data downloads are necessary to run the system. Additionally, all ROUGE and readability formulas are evaluated at test time and handled by the controller script at the last stage.
- SSH into Patas
ssh <UW NetID>@patas.ling.washington.edu
cd ling573
conda activate 573-envpython preprocess/clean.pyThis script takes approximately 1-2 minutes, and can be run directly on the Patas head node.
To create the semantic self-segmented chunks from BillSum documents, we directly call the Se3 submodule:
cd preprocess/se3/
git submodule init
git submodule update
git pull
condor_submit augmented_segment.cmdThis stage takes approximately 2-3 hours to run as a Condor job on the entire test split. This also requires running Se3's metric learning script, which we describe briefly in the ATS section.
To simplify the document segments, we navigate back to the preprocess module:
cd ..
condor_submit augmented_simplify.cmdAlternatively, this can be run on Hyak:
cd ..
sbatch On Hyak, this step takes approximately 2 hours to run.
Run the controller script which will generate and submit a Condor job on your behalf:
cd ling573
./generate_run.sh patas
# Attend for successful Condor job submission messageIf running on Hyak, the controller script will generate a SLURM batch job, build an Apptainer with the Conda environment and submit the job on your behalf:
cd /gscratch/scrubbed/jcmw614/ling573
./generate_run.sh hyak
# Attend for successful SLURM job submission messageNote: this takes approximately 40 min for the first 15 documents of the test set, which were segmented into 130 text chunks.
Run the controller script which will generate and submit a Condor job on your behalf:
cd ling573
./generate_finetune.sh wugwatts-led_unsimpThis will generate the requisite Condor log, error, and output files with the template finetune_model.<cluster number> where you can best monitor runtime and issues with the .err file.
Note that test data is directly loaded with the datasets library so no extra arguments are needed on the command line.
- SSH into Patas
ssh <UW NetID>@patas.ling.washington.edu- Submit condor job
condor_submit run_baseline/run_baseline.cmd- Wait...
- Find your output in
run_baseline.out
- Activate virtual environment (see above)
- Run system from terminal
python backup_run.py- Wait...but hopefully not as long!
- In our ad-hoc backup run, output was directly printed to console and manually copied into a text file,
baseline_console.txt. This console output can be aligned with the providedtitlecolumn from BillSum using thealign()function defined inbackup_run.pyand written to a CSV,baseline_test.csv. This CSV can be used for baseline evaluation.
- Extract confidence intervals on ROUGE scores with
eval_metrics.py
cd eval
python eval_metrics.py- Extract readability scores in
eval_readability.ipynbas we are still testing out different eval resources
jupyter notebook- Use the provided readability scores to evaluate t-tests on each readability score
After testing that a pre-existing ATS code repo behaves as expected when tested independently, we add the repository in the preprocess child directory.
A prerequisite to run the wugwATSs system is to have access to adjusted Legal-BERT sentence embeddings as instantiated by the Se3 repository. This can be handled by a Condor job in approximately 7 hours:
cd preprocess/se3
condor_submit learning.cmd