Twitter Semantic Search

Basic semantic search for a tweet archive. Part of the Community Archive ecosystem. Generates semantic embeddings with OpenAI for each tweet thread (replies & retweets are ignored). Embeddings are inserted into CloudFlare's Vectorize DB. The frontend embeds the query with OpenAI and searches the vector DB.

Live demo: (https://defenderofbasic.github.io/twitter-semantic-search/)

Self host this for your own tweet archive

The general steps are, create & deploy the CloudFlare worker + vector DB (see instructions in cloudflare-worker/ directory). Then generate embeddings (run the script in generate-embeddings/ with your archive JSON in archives/). Finally run the frontend/ and replace the cloudflare URL with your own, and a URL where the archive JSON is hosted.

TODO later:

Support offline mode. Can just use a local server that queries the vectra DB, no need for cloudflare.
script to turn a twitter zip to a single gzipped json, so you can do this even if your data isn't on Community Archive.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
cloudflare-worker		cloudflare-worker
frontend		frontend
generate-embeddings		generate-embeddings
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Semantic Search

Self host this for your own tweet archive

TODO later:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Semantic Search

Self host this for your own tweet archive

TODO later:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages