CS 5340 Question Answering System

Team Members

Jackson Dean (solo team)

Running the Program

Navigate to the root directory of the project. If you are running it on the CADE machines, this will be
```
/home/u1100004/cs5340-qa/
```
The project was tested on machine lab1-14.eng.utah.edu.
Activate the virtual environment
```
source ./venv/bin/activate.csh
```
Run the program
```
python3 qa.py <path to input file>
```

Options

Optionally, you can provide one or two extra files to the program

python3 qa.py <path to input file> <path to output file> <path to answer file>

If you provide the output file, the program will write the output to that file rather than printing to stdout.

If you provide the answer file as well, the program will write the output to the output file, then compare the answers to those given in the answer file and print the accuracy of the system as determined by score-answers.pl

External Libraries

NLTK - Natural Language Toolkit
- Tokenization
- POS tagging
spaCy - Industrial-Strength Natural Language Processing
- NER
- Used model en_core_web_sm
scikit-learn - Machine Learning in Python
- TF-IDF vectorization
- SVM classifier
rake-nltk - Rapid Automatic Keyword Extraction
- Unused in final version but still present in codebase
- Keyword extraction
numpy - Numerical Python
- Required by other libraries
scipy - Scientific Python
- Parameter optimization

NLTK Data

The following nltk datasets are required to be downloaded before running.

punkt
stopwords
wordnet
omw-1.4
averaged_perceptron_tagger
words

spaCy Data

The following spaCy model is required to be downloaded before running.

en_core_web_sm

Time Estimate

The QA tool takes approximately 6.25s to process a single story. However, some of this time is spent on initial setup which is only required once, not for every story.

This startup time includes training a question classifier using the training data in test-files/question_training.txt.

Known Problems

None

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
devset-official		devset-official
test-files		test-files
testset1		testset1
NLTK_experiments.py		NLTK_experiments.py
QAOptions.py		QAOptions.py
README.md		README.md
helpers.py		helpers.py
preprocess.py		preprocess.py
project.pdf		project.pdf
qa.py		qa.py
qa_evaluator.py		qa_evaluator.py
question_classifier.py		question_classifier.py
score-answers-quiet.pl		score-answers-quiet.pl
score-answers.pl		score-answers.pl
weight_optimizer.py		weight_optimizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 5340 Question Answering System

Team Members

Running the Program

Options

External Libraries

NLTK Data

spaCy Data

Time Estimate

Known Problems

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CS 5340 Question Answering System

Team Members

Running the Program

Options

External Libraries

NLTK Data

spaCy Data

Time Estimate

Known Problems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages