This repo contains the code for this paper "AEON: A Method for Automatic Evaluation of NLP Test Cases".
This repo also includes the raw results as well as the questionnaires of the human evaluation mentioned in the paper.
Installing all the packages using pip is suggested:
$ pip install -r requirements.txt
To use AEON:
$ python scorer.py --ori-data PATH_TO_ORI --adv-data PATH_TO_ADV
The files PATH_TO_ORI and PATH_TO_ADV should be lines of texts and be paired. For example, data/ori.txt and data/adv.txt.
Check these files to see the options.
- Test case generation: please refer to files under
script/which use seed data indata/textattack/datasets/. Seed data need pre-processing (cleaning) usingutils/clean.py. - Generate questionnaires for human evaluation: use
utils/user_study.py. - Perform robust re-training: use
utils/train_model.py. - Raw human evaluation results: see files under
annotation/raw_annotation. The statistics can be computed usingannotation/statistics.py. - Baselines: both NLP-based and NC-based metrics are implemented in
baselines/(may need extra dependencies to run). - Raw experiment results are recorded in
annotation/result/. The AP, AUC, and PCC are calculated usingannotation/AP-AUC-PCC.py.
For more details, please refer to this paper. Please remember to cite us if you find our work helpful in your work!
@inproceedings{huang2022aeon,
title={AEON: a method for automatic evaluation of NLP test cases},
author={Huang, Jen-tse and Zhang, Jianping and Wang, Wenxuan and He, Pinjia and Su, Yuxin and Lyu, Michael R},
booktitle={Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages={202--214},
year={2022}
}