Skip to content

CUHK-ARISE/AEON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AEON

This repo contains the code for this paper "AEON: A Method for Automatic Evaluation of NLP Test Cases".

This repo also includes the raw results as well as the questionnaires of the human evaluation mentioned in the paper.

Install

Installing all the packages using pip is suggested:

$ pip install -r requirements.txt 

Get started

To use AEON:

$ python scorer.py --ori-data PATH_TO_ORI --adv-data PATH_TO_ADV

The files PATH_TO_ORI and PATH_TO_ADV should be lines of texts and be paired. For example, data/ori.txt and data/adv.txt.

Check these files to see the options.

Reproduce our Experiments

  • Test case generation: please refer to files under script/ which use seed data in data/textattack/datasets/. Seed data need pre-processing (cleaning) using utils/clean.py.
  • Generate questionnaires for human evaluation: use utils/user_study.py.
  • Perform robust re-training: use utils/train_model.py.
  • Raw human evaluation results: see files under annotation/raw_annotation. The statistics can be computed using annotation/statistics.py.
  • Baselines: both NLP-based and NC-based metrics are implemented in baselines/ (may need extra dependencies to run).
  • Raw experiment results are recorded in annotation/result/. The AP, AUC, and PCC are calculated using annotation/AP-AUC-PCC.py.

References

For more details, please refer to this paper. Please remember to cite us if you find our work helpful in your work!

@inproceedings{huang2022aeon,
  title={AEON: a method for automatic evaluation of NLP test cases},
  author={Huang, Jen-tse and Zhang, Jianping and Wang, Wenxuan and He, Pinjia and Su, Yuxin and Lyu, Michael R},
  booktitle={Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis},
  pages={202--214},
  year={2022}
}

About

Code and data for the paper: AEON: A Method for Automatic Evaluation of NLP Test Cases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors