by Stephen Yin
Github repository to replicate the results from the FLARE paper results
This project used Python 3.10.13.
To set up the dependencies, it is recommended to use a virtual environment (conda was used for this project).
- Run
bash setup/setup.sh - Retrieve your OpenAI API key. Add the following to your
~/.bashrc:export OPENAI_API_KEY="{YOUR_API_KEY_HERE}"(replace the curly braces as well)
Note: The version of sentencepiece was changed from 0.1.83 to 0.1.98 to be compatible with the use of Python 3.10.13.
Follow the instructions from the ASQA repository. Rename the ASQA dataset to ASQA_full.json and place it in the directory dataset/ASQA_full.json
Then, create the test set by subsampling from the dev split of ASQA:
python setup/select_questions.py
Download the Wikipedia dump from the DPR repository using the following command:
mkdir dataset/dpr
wget -O dataset/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd dataset/dpr
gzip -d psgs_w100.tsv.gz
popd(Instructions taken from beir example)
To be able to run Elasticsearch, you should have it installed locally (on your desktop) along with pip install beir. Depending on your OS, you would be able to find how to download Elasticsearch. I like this guide for Ubuntu 18.04 - https://linuxize.com/post/how-to-install-elasticsearch-on-ubuntu-18-04/
For more details, please refer here - https://www.elastic.co/downloads/elasticsearch.
This code doesn't require GPU to run.
Run the following command to build the ElasticSearch index
python setup/build_index.py --datapath dataset/dpr/psgs_w100.tsv
There are 21,015,325 documents in the wikipedia dump to load.
Run the following command to generate results on the selected test set
python model/flare.py -d {DATASET (ASQA 500 examples or ASQA_mini 50 examples)} -n {NAME_OF_EXPERIMENT}
The results should be saved in outputs/{NAME_OF_EXPERIMENT}.json
The data analysis should be saved in outputs/{NAME_OF_EXPERIMENT-analytics}.json
Outputs should be correctly formatted such that one can follow the instructions from the ASQA repo.
Example results for the reimplementation of FLARE_direct with implicit queries:
{
"rougeLsum": 27.630332229755194,
"length": 136.802,
"str_em": 40.75,
"QA-EM": 18.246666666666663,
"QA-F1": 24.25903202141283,
"QA-Hit": 2.6,
"ovscore": 25.88986508894757
}