MiniWatson

Description

MiniWatson is a simplified version of IBM Watson's Question Answering (QA) System. Documents used for QA are a sampled set of 280000 wikipedia documents. Indexing program will index all 280000 documents based on the type of indexing desired (options are positional, custom, lemma, porter, and standard). When indexed, all documents will appear in a directory for the program to access when queries for questions are ran. When running the questions program will be given the type of indexing that was done, along with desired scoring function to use (options for scoring are bm25, tfidf, and default)
Output will be produced in the following format:
------------------------------------------------------------------------
Currently searching for: The Washington Post
Document hit for The Washington Post at position: 5
Currently searching for: Taiwan
Document hit for Taiwan at position: 1
...
...
...
Currently searching for: 3M
Currently searching for: Robert Downey, Jr.
Document hit for Robert Downey, Jr. at position: 1
Total hits in top 10 docs: 64
P@1: 0.40
Docs in position 1: 40
Docs in position 2: 10
...
...
...
Docs in position 9: 0
Docs in position 10: 2
------------------------------------------------------------------------

Usage

Before you can Query you will need to download all of the indexed documents, which can be done by running ./getIndexedDocs, which will download all of the indexed documents, and place them where they should be in the directory or run the indexing on your own, by running ./getDocs which will retrieve the subset of wikipedia documents, then after that finishes run ./indexDocs {optional_index_type}.
To run the query engine you will need to use a bash shell and have maven installed and run ./runQueries {optional_index_type} {optional_scoring_method}. If no parameters are passed in, the program will default to positional index type with bm25 scoring

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.settings		.settings
results		results
src		src
README.md		README.md
getDocs		getDocs
getIndexedDocs		getIndexedDocs
indexDocs		indexDocs
pom.xml		pom.xml
runQueries		runQueries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniWatson

Description

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

david-mclain/MiniWatson

Folders and files

Latest commit

History

Repository files navigation

MiniWatson

Description

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages