Skip to content

jerrytmcl/Body-X

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Semantic Search

Basic semantic search for a tweet archive. Part of the Community Archive ecosystem. Generates semantic embeddings with OpenAI for each tweet thread (replies & retweets are ignored). Embeddings are inserted into CloudFlare's Vectorize DB. The frontend embeds the query with OpenAI and searches the vector DB.

Live demo: (https://defenderofbasic.github.io/twitter-semantic-search/)

Self host this for your own tweet archive

The general steps are, create & deploy the CloudFlare worker + vector DB (see instructions in cloudflare-worker/ directory). Then generate embeddings (run the script in generate-embeddings/ with your archive JSON in archives/). Finally run the frontend/ and replace the cloudflare URL with your own, and a URL where the archive JSON is hosted.

TODO later:

  • Support offline mode. Can just use a local server that queries the vectra DB, no need for cloudflare.
  • script to turn a twitter zip to a single gzipped json, so you can do this even if your data isn't on Community Archive.

About

Basic semantic search for a tweet archive

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 53.3%
  • HTML 37.7%
  • TypeScript 8.3%
  • CSS 0.7%