Inspired by this post: Some good English word datasets. In particular:
- WordNet: a while back, researchers at Princeton put together WordNet, a database of relations between words. This includes definitions and synonyms and things like that. It's no longer maintained by Princeton, but was forked and maintained with a release 1 month ago. There are libraries to load it.
- Wortschatz Leipzig frequency lists: A project under Leipzig University, this is a collection of word frequency lists from various sources. The English downloads page has datasets drawn from news, the web, or Wikipedia.
Inspired by this post: Some good English word datasets. In particular: