Benchmark QA dataset generation from Reactome and Evaluation with 9 LLMs
Original Reactome data found on Zenodo-PathwayQA https://zenodo.org/records/16704967
Graphical overview of prompt generation:

We recommend installing all dependencies in a conda environment:
conda env create --file environment.yml
- Fully parsed reaction and disease data from Reactome can be found on Zenodo [link]
- Prompt and answers for the reaction and disease tasks can be found in the
/datafolder
The LLM models must first be downloaded from HuggingFace. The vllm python package is required to run the script found in /run_models.
The evaluation scripts are run on the output files of the LLM models in order to judge whether the generated answer matches the true answer.
compare_answers_gpt.pyqueries GPT 4.1 to test if the generated and true answers match. For every entity in the true answer, the model determines if it is in the generated output.postprocess_validate.pyconverts the output into a score per reaction.disease_agreement.pyperforms the validation for the disease queries using the LLM.string_match.pyperforms a simple string match.