Semantic GitHub repository discovery for people who need the right project, not the most famous one.
Search by intent. Rank by fit, evidence, health, risk, and underrated potential.
GitHub search is excellent when you already know the right keywords. It is weaker when you know the job you need done but not the exact repository name, ecosystem wording, or hidden-gem project.
RepoRadar turns a plain-English need into an inspectable shortlist. It searches broadly, narrows with local semantic signals, enriches survivors with repository evidence, and ranks them by usefulness rather than popularity alone.
| Area | RepoRadar does |
|---|---|
| Input | Natural-language repository search with filters |
| Output | Ranked repositories with evidence, risks, and comparisons |
| Ranking | Fit, Future, Underrated, and risk signals |
| Evidence | README, manifests, releases, issues, PRs, contributors, stars, forks, topics |
| Stack | Next.js, TypeScript, Prisma, PostgreSQL, pgvector, Octokit |
| Runtime | Local embeddings first, optional LLM explanation second |
| Search time | Fresh searches usually take about 60-70 seconds |
| Deploy | Railway-friendly, self-hostable, no paid services required in NO_LLM_MODE=true |
- Search by capability instead of keyword luck.
- Find smaller projects that are healthy but not yet famous.
- Compare repositories with the evidence visible.
- See why a repo ranked highly, what it is missing, and what risks to inspect.
- Keep LLMs out of raw fact collection: GitHub data is fetched directly, then explained.
| Stage | What happens | Why it matters |
|---|---|---|
| Intent | Parses the user's plain-English need into search constraints | Short prompts become structured searches |
| GitHub search | Runs multiple query variants against GitHub | Improves recall beyond one keyword phrase |
| Candidate cache | Reuses fresh candidate pools when possible | Keeps repeated searches faster and cheaper |
| Vector funnel | Uses local embeddings (conjunctive per-aspect) plus a credibility floor to narrow the pool | Drops weak matches and keyword-stuffed 0-signal repos before expensive enrichment |
| Enrichment | Fetches README, manifests, releases, issues, PRs, contributors, and metadata | Scores are grounded in observable evidence |
| Scoring | One listwise pass ranks survivors and flags off-topic repos; produces Fit, Future, Underrated, and risk signals | Results are ranked for actual usefulness, and irrelevant repos are demoted instead of padding the shortlist |
| View | What it shows |
|---|---|
| Search | Compact prompt box, filters, example queries, and progress state |
| Results | Ranked cards, hidden gems, score badges, risks, evidence summaries, and compare controls |
| Repo detail | README evidence, trend signals, health indicators, and risk explanations |
| About | Pipeline explanation and score composition |
| Status | Database, pgvector, embedding, LLM, and typical search-time health |
| Score | Meaning |
|---|---|
| Fit | How well the repo matches the user's actual need |
| Future | Whether the repo looks maintained and likely to remain useful |
| Underrated | Whether the repo has high signal without being saturated by popularity |
| Risk | Missing license, stale activity, weak release history, or other inspection flags |
The model is deterministic first and model-assisted second. The LLM can help with intent and explanations, but raw counts and repository facts come from GitHub and the database.
- Embeddings run locally and free via Transformers.js.
NO_LLM_MODE=trueruns the full pipeline without paid model calls.- The optional LLM layer uses an OpenAI-compatible endpoint, so you can choose the provider.
# 1. configure
cp .env.example .env
# 2. database (Postgres + pgvector)
docker compose up -d
# 3. install + migrate + run
pnpm install
pnpm db:deploy
pnpm devOpen http://localhost:2000 and search.
Good first searches:
browser testingnotion editorsimple react statelocal-first collaborative markdown editoropen-source alternative to Firebase Auth for Next.js
Fresh searches usually take about 60-70 seconds. Warm repeat searches can be faster because candidate pools and enrichment evidence are cached.
Important environment variables:
| Variable | Purpose |
|---|---|
DATABASE_URL |
PostgreSQL connection string with pgvector |
GITHUB_TOKEN |
Raises GitHub API limits for search and enrichment |
OPENROUTER_API_KEY |
Enables optional model-assisted scoring and explanation |
NO_LLM_MODE |
Runs deterministic-only mode when set to true |
MAX_SEARCH_QUERIES |
Controls how many GitHub query variants are executed |
FUNNEL_TOP_N |
Controls how many candidates survive the local funnel |
LIGHT_ENRICH_TOP_N |
Controls cheap pre-funnel enrichment |
INTENT_TIMEOUT_MS |
Timeout for intent extraction |
LISTWISE_TIMEOUT_MS |
Timeout for listwise ranking |
SEARCH_ETA_SECONDS |
Progress-bar estimate, currently calibrated around 67 |
SEARCH_DEBUG |
When true, writes per-candidate funnel/ranking traces to logs/search-debug.jsonl (local tuning only) |
RepoRadar includes a short manual benchmark for search quality:
node scripts/search-benchmark.mjs --limit 6It uses general-user prompts such as browser testing, notion editor, and simple react state.
It reports top repositories, expected-known repo presence, latency, and diagnostic evidence in
logs/search-diagnostics.jsonl.
For a single ad-hoc query you can run the local runner and print the ranked shortlist with per-repo fit/future/similarity/source:
node scripts/run-search.mjs "kubernetes monitoring and observability"Set SEARCH_DEBUG=true first to also capture per-candidate funnel scores (similarity, aspect
sims, prefilter score, survivor flags) and final rank scores in logs/search-debug.jsonl — the
fastest way to see why a repo was kept, dropped, or ranked where it was. The logs/ directory
is gitignored.
README visual assets are generated with Python and Pillow:
python scripts/generate_readme_workflow.py
python scripts/generate_readme_concept_diagrams.pyRepoRadar gets stronger when search failures are visible and fixable.
- Open an issue with a query that ranked poorly.
- Improve the scoring rubric or evidence extraction.
- Add manifest parsers, chart types, or accessibility improvements.
- Keep the README and docs in sync with product behavior.
Star RepoRadar if you want open-source discovery to reward usefulness, evidence, and maintenance instead of only marketing reach or historical popularity.
A star helps the project reach more builders looking for the right dependency and more maintainers whose good projects deserve to be found.
MIT - free to use, self-host, fork, and build on.


