ScholarScout reads 250M+ academic papers from 8 databases and generates actionable ideas
tailored to your goal: thesis, hackathon, SaaS product, literature review, or your next feature.
Quick Start · Four Modes · Features · Documentation · Live Demo · Changelog
git clone https://github.com/neej4/ScholarScout.git
cd ScholarScout
pip install -r requirements.txt
python preview_server.pyOpen http://localhost:5050 — the setup wizard walks you through in 30 seconds.
Need an LLM? Pick one:
| Provider | Cost | Speed | Setup |
|---|---|---|---|
| Gemini | Free (15 req/min) | Fast | Get key |
| Groq | Free tier | Very fast | Get key |
| Ollama | Free (local) | GPU-dependent | Download |
| Custom | Any | Any | Your local proxy (LM Studio, 9router) |
| OpenRouter | Pay-per-token | Varies | Get key |
| OpenAI | Pay-per-token | Fast | Get key |
Same papers, four different lenses:
| Mode | You ask | You get |
|---|---|---|
| Academic | "What can I research?" | Thesis topics, methodology, key papers, novelty check |
| Product | "What can I build?" | MVP features, tech stack, revenue model, competitors |
| Develop | "What can I add to my project?" | Features, integrations, optimizations grounded in your codebase |
| Review | "What's the state of the field?" | Thematic clusters, synthesis per cluster, gaps, open questions, reading list |
Develop mode treats your project description as a hard constraint — every idea must be directly applicable.
Review mode doesn't generate ideas. It organizes and synthesizes existing papers into a literature review skeleton.
- Owl Chase pixel art game while pipeline runs (papers spawn as dots you catch)
- Live graph showing papers grouped by category or cluster
- LLM Chat tab narrating what the AI is doing
- Adaptive phase list (5 phases default, 6 phases review)
- Trend analysis with confidence scoring
- Evidence Pack per generated idea: source papers, evidence claims, grounding score, and audit flags
- Anti-hallucination: P-number grounding (LLM citations are validated against fetched papers)
- Novelty check via semantic similarity (Gemini embeddings) or Jaccard fallback
- Quality scoring 1-10, low-quality filtered
- Deep dive: outline, methodology, datasets, timeline, tools, references
- Paper freshness: least-used papers prioritized, auto-widens date range when exhausted
- 8 sources: arXiv, OpenAlex, Semantic Scholar, PubMed, Crossref, DOAJ, Scopus, DBLP
- Smart source routing per category (medicine → PubMed+Scopus, CS → arXiv+DBLP)
- 80+ categories across 10 disciplines
- Cache-aware with expiry (7 days configurable)
- Citation-based sorting
- 18+ skill profiles (Academic, Product, Develop, Review)
- File upload (.pdf/.txt/.md/.json) as extra context
- Approach filter: Computational, Experimental, Clinical, Theoretical
- Onboarding wizard in 3 steps
- Real-time SSE streaming
- Search, filter, bookmark, export all ideas to Markdown
- Session recovery restores ideas plus cached deep dives/implementation scouting
- Evidence badges: Grounded, Partial, or Needs Review
- Session history (last 20 runs, review + default)
- Toast notifications (no browser alerts)
- Keyboard shortcuts
ScholarScout/
├── preview_server.py # Entry point
├── run_pipeline.py # CLI pipeline runner
├── config.example.yaml # Config template (copy to config.yaml)
├── src/
│ ├── core/
│ │ ├── orchestrator.py # Pipeline controller (default + review)
│ │ ├── analyzer.py # Trend analysis
│ │ ├── generator.py # 4-mode idea generation
│ │ ├── clusterer.py # Paper clustering (review mode)
│ │ ├── synthesizer.py # Literature synthesis (review mode)
│ │ ├── deep_dive.py # Deep dive analysis
│ │ ├── novelty_checker.py # Novelty scoring
│ │ ├── llm.py # Multi-provider LLM client (6 providers)
│ │ ├── config.py # Configuration + thresholds
│ │ ├── models.py # Dataclasses
│ │ └── fetchers/ # 8 source fetchers
│ └── web/
│ ├── routes/ # Flask blueprints
│ ├── templates/ # Dashboard HTML
│ └── static/ # JS, sprites, owl game
├── skills/ # ACADEMIC/ PRODUCT/ DEVELOP/ REVIEW/
├── tests/ # 90+ automated tests
└── data/ # Cache, snapshots, history (gitignored)
# Academic mode
SCOUT_GOAL="THESIS" SCOUT_CATEGORIES="cs.AI,cs.CL" python run_pipeline.py
# Product mode
SCOUT_GOAL="HACKATHON" python run_pipeline.py
# Develop mode
SCOUT_GOAL="FEATURE" SCOUT_CONTEXT="Flask app with LLM integration" python run_pipeline.py
# Review mode
SCOUT_GOAL="SYNTHESIS" SCOUT_CONTEXT="federated learning for healthcare" python run_pipeline.pypip install -e ".[dev]"
pytest tests/ --ignore=tests/integration # Unit tests
npm test # JavaScript testsSee CONTRIBUTING.md. High-impact areas:
- New fetchers: implement
BaseFetcher(1 file, ~150 LOC) - New skill profiles: add markdown to
skills/ - Prompt improvements:
generator.py,analyzer.py,synthesizer.py - New categories: update
KEYWORD_SEEDS+ fetcher mappings
If ScholarScout saved you time, consider supporting:
MIT — see LICENSE.

