Turn your Twitter/X archive into a searchable, visual knowledge base.
Drop in your archive zip. Tweetscope embeds every tweet, maps them onto a 2D scatter plot, clusters them by topic, and labels each cluster with an LLM — so you can browse years of tweets by theme instead of scrolling chronologically.
Navigate a hierarchical topic tree. Click any topic to filter the scatter plot and see matching tweets in the feed. Drill down from broad themes to fine subtopics.
Expand the sidebar into a full-width carousel — one column per subtopic, with engagement metrics, quote embeds, and thread chains side by side.
Search your tweets by meaning (vector nearest-neighbor via VoyageAI) or by keyword (full-text search), with matching points highlighted live on the scatter plot.
Your tweets flow through a six-step ML pipeline: ingest → embed → UMAP → cluster → label → explore. Each step writes reproducible artifacts (Parquet, HDF5, JSON). You can re-run any step with different parameters and compare results via scopes.
- Python 3.11+ and uv
- Node.js 22+ and npm
- API keys: VoyageAI (embeddings) and OpenAI (cluster labeling)
- A Twitter/X archive zip (request yours at https://x.com/settings/download_your_data)
git clone --recurse-submodules <repo-url>
cd latent-scope
# Python pipeline
uv pip install -e .
# API + frontend
cd api && npm install && cd ..
cd web && npm install && cd ..cp .env.example .env
cp api/.env.example api/.envEdit .env — set your data directory and API keys:
LATENT_SCOPE_DATA=~/latent-scope-data
VOYAGE_API_KEY=your-key
OPENAI_API_KEY=your-key
LATENT_SCOPE_APP_MODE=studio
Edit api/.env — set the same data directory and keys:
LATENT_SCOPE_DATA=~/latent-scope-data
LATENT_SCOPE_APP_MODE=studio
VOYAGE_API_KEY=your-key
PORT=3000
# Place your archive in archives/ (gitignored)
cp ~/Downloads/twitter-archive.zip archives/
# Import and run the full pipeline (embed → UMAP → cluster → label → scope)
uv run python3 -m latentscope.scripts.twitter_import my-tweets \
--source zip --zip_path archives/twitter-archive.zip --run_pipeline# Terminal 1: start the API
cd api && npm run dev
# Terminal 2: start the frontend
cd web && npm run devOpen http://localhost:5174 — select your dataset and scope to explore.
For archives with 100k+ tweets, you can import year by year to keep memory manageable, then run the pipeline once on the full dataset. See Progressive import in the development guide.
See DEVELOPMENT.md for:
- Architecture overview and runtime modes
- Repository structure
- Python pipeline reference (CLI, scripts, data contracts)
- Hono API routes and adding new endpoints
- Frontend contexts, hooks, and styling rules
- Dataset directory structure and data contracts
- Deployment guide (Vercel + Cloudflare R2)
# Clone with submodules
git clone --recurse-submodules <repo-url>
# or after clone:
git submodule update --init --recursiveSee DEVELOPMENT.md for architecture details and dev setup.