Document rerieval from corpus using BM25 algorithm.
- Corpus is present in project folder "/Dataset_Algorithms"
- First, We create the inverted index for the given corpus and save it in index.out file
- tokenization, stop words removal and stemming are done for inverted index.
- Used PorterStemmer for stemming
- This index.out will have all the root words in it and serve as input for BM25 algorithm.
This project is created as a Spring project and should be run on application server.