This work was done as a part of CS685.
Author: Daivik Swarup
Download data from here
Split data into train, test, val splits:
python preprocess.pyFor classification, create thresholded text files:
python preprocess_threshold <PATH-TO-TRAIN-DIR> train_80_20.txt
python preprocess_threshold <PATH-TO-VAL-DIR> val_80_20.txt
python preprocess_threshold <PATH-TO-TEST-DIR> test_80_20.txt python binary_classification.py <VECTORIZER> output.pklcan be one of {'tfidf', 'count', 'tfidf_length', 'count_length', 'bert'}
For lstm:
python train_lstm.pypython train_ranknet.py <VECTORIZER> model.ptcan be one of {'tfidf', 'count', 'tfidf_length', 'count_length', 'bert'}
For lstm:
python train_ranknet_lstm.pyScripts in the misc directory are self explanatory.