Have worked with a dataset of movie plot summaries that is available from the Carnegie Movie Summary Corpus site. This is about building a search engine for the plot summaries that are available in the file “plot summaries.txt” that is available under the Dataset link (http://www.cs.cmu.edu/~ark/personas/).
Used the the tf-idf technique to accomplish the above task. Done in scala that can run on a Databricks cluster.