fake-astro-ph

We’d like to

Save thousands of arXiv titles via the ADS API (probably into mongoDB to avoid re-querying when we want to search/filter later) https://github.com/andycasey/ads
Save the titles into a text files
Generate a word embedding with Gensim: https://radimrehurek.com/gensim/
Given an embedding, generate a grammar with https://spacy.io/
Given word embedding and grammar, make random titles with https://github.com/jsvine/markovify
If time allows, make a website to share!

It’s a bit like http://davidsd.org/2010/03/the-snarxiv/ but with an NLP approach rather than a large set of rules

Notes:

We should be able to read in our abstracts by directly querying the database: see https://radimrehurek.com/gensim/tut1.html#corpus-streaming-one-document-at-a-time

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
data		data
downloader		downloader
embedder		embedder
randomiser		randomiser
scratch		scratch
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
__init__.py		__init__.py
app.py		app.py
generate_sentences.pyc		generate_sentences.pyc
main.py		main.py
requirements.txt		requirements.txt

Provide feedback