scholar-tools

A Claude Code plugin for academic research with domain-aware skill routing. /research discover "topic" searches multiple databases in parallel, ranks papers by venue quality and citations, quick-reads the top results, and provides a landscape summary. /research discuss drives deep research ideation with adversarial novelty checks and reviewer simulation. /research cite generates verified BibTeX. Every citation traces to an API call, never to model memory.

What it does

Command	What happens
`/research discover "topic"`	S2 semantic search + AlphaXiv MCP retrieval (3 tools in parallel: agentic / full-text / embedding), deduplicate, quality-rank, quick-read top papers via AlphaXiv markdown, generate landscape summary
`/research discuss`	Deep discussion: assumption surfacing, adversarial novelty check, reviewer simulation, significance test, experiment design
`/research discuss <paper>`	Start discussion from a specific paper
`/research read <paper>`	Deep structured analysis with domain expert perspective
`/research cite <paper>`	Verified BibTeX via DBLP > CrossRef > S2 chain
`/research write <section>`	Write a paper section with Triple Review Gate (abstract/intro) + Consistency Check (method/experiments/conclusion)

All commands support --domain <categories> (additive) or --domain-only <categories> (exclusive) to override auto-detected domain skills.

Skill Router

The central innovation: a Skill Router maps paper content to 21 domain skill categories from Orchestra-Research/AI-Research-SKILLs, injecting expert knowledge into each phase.

#	Category	Example Skills
1	Model-Architecture	litgpt, mamba, nanogpt, rwkv, torchtitan
2	Tokenization	huggingface-tokenizers, sentencepiece
3	Fine-Tuning	axolotl, llama-factory, peft, unsloth
4	Mechanistic-Interpretability	transformer-lens, saelens, pyvene, nnsight
5	Data-Processing	ray-data, nemo-curator
6	Post-Training	trl, grpo-rl-training, openrlhf, simpo, verl
7	Safety-Alignment	constitutional-ai, llamaguard, nemo-guardrails
8	Distributed-Training	megatron-core, deepspeed, pytorch-fsdp2, accelerate
9	Infrastructure	modal, skypilot, lambda-labs
10	Optimization	flash-attention, bitsandbytes, gptq, awq, hqq, gguf
11	Evaluation	lm-evaluation-harness, bigcode-evaluation-harness
12	Inference-Serving	vllm, tensorrt-llm, llama.cpp, sglang
13	MLOps	weights-and-biases, mlflow, tensorboard
14	Agents	langchain, llamaindex, crewai, autogpt
15	RAG	chroma, faiss, sentence-transformers, pinecone, qdrant
16	Prompt-Engineering	dspy, instructor, guidance, outlines
17	Observability	langsmith, phoenix
18	Multimodal	clip, whisper, llava, stable-diffusion, segment-anything
19	Emerging-Techniques	moe-training, model-merging, long-context, speculative-decoding
20	ML-Paper-Writing	ml-paper-writing (auto-invoked in write phase)
21	Research-Ideation	brainstorming-research-ideas, creative-thinking-for-research

The router auto-detects relevant categories from paper keywords and classifies them as primary (core contribution) or secondary (peripheral tool).

How search works

Two agents run in parallel, each with a 60-second timeout:

Semantic Scholar — semantic search across 200M+ papers (s2_search.py) + boolean bulk search (s2_bulk_search.py) with year filtering. Primary search source.
AlphaXiv MCP (claude.ai-bound) — three retrieval tools fired in parallel inside the agent: agentic_paper_retrieval (NL/conceptual), full_text_papers_search (body keyword match), embedding_similarity_search (semantic neighbors). Papers surfaced by multiple tools get a stronger relevance signal for downstream ranking.

Results are deduplicated by arXiv ID / DOI / title similarity, then scored:

Signal	Weight
CCF ranking (A/B/C)	base score
JCR/CAS quartile	base score
Impact factor	30%
Citation count (log-scaled)	20%
Recency	10%
First-author h-index	10%

arXiv-only papers with < 100 citations get a -20 penalty. Published versions are preferred.

After scoring, each top paper is quick-read via curl -sL https://alphaxiv.org/overview/{id}.md (or S2 abstract if not on arXiv) and receives a verdict: Must read / Worth reading / Skim / Skip. A landscape summary synthesizes key themes and trends.

How discussion works

The discuss phase is a 9-sub-phase ideation engine:

Setup — load discover/read results, invoke skill router + research-ideation skills
Assumption Surfacing — challenge inherited conventions in the field
Discussion Loop — iterative analysis with auto knowledge gap filling and out-of-domain search
Adversarial Novelty Check — verify proposed directions against existing literature
Reviewer Simulation — generate specific reviewer objections with severity ratings
Significance Test — 3-tier assessment (real-world impact, community impact, improvement magnitude)
Simplicity Test — can the idea be explained in 2 sentences without jargon?
Experiment Design — baselines, datasets, ablations, expected results, compute requirements
Convergence Decision — direction comparison matrix backed by evidence

Output: a structured research brief (brief.json) that feeds into the write phase.

How citations work

Zero hallucination policy. Every BibTeX entry must trace to an API response. The chain:

1. DBLP search → title match (>90% token overlap)
   → If published venue: fetch condensed .bib                          → "via DBLP"
   → If arXiv-only (CoRR): fetch from arxiv.org instead               → "via arXiv"
2. CrossRef search → extract DOI → content negotiation                 → "via CrossRef"
3. S2 exact match → construct from metadata (or arXiv if available)    → "via S2 — verify manually"
4. All fail → "Citation source not verified. Not safe to cite."

Never generates BibTeX from model memory. Never fills in year/venue/authors from model knowledge.

How writing works

The write phase adds two quality mechanisms:

Triple Review Gate (abstract + introduction): Three perspectives (Reviewer, AC/SAC, Senior Researcher) each provide 2-3 specific revision suggestions
Consistency Check (method, experiments, conclusion): Cross-reference scan ensures contributions match experiments, claims match results, assumptions match setup

Installation

Claude Code

# Option 1: Install via marketplace
claude plugin marketplace add arsity/scholar-tools
claude plugin install scholar-tools@research

# Option 2: Manual install
git clone https://github.com/arsity/scholar-tools.git ~/.claude/plugins/scholar-tools

Prerequisites

Required:

Semantic Scholar API key — get one at semanticscholar.org/product/api, then add to your project's .claude/settings.json:
```
{
  "env": {
    "S2_API_KEY": "your-key-here"
  }
}
```
Claude Code automatically exports env entries as environment variables — scripts read $S2_API_KEY directly.
Domain skills — 85 skills from Orchestra-Research/AI-Research-SKILLs are bundled in vendor/ai-research-skills/ (no separate installation needed). They are not registered as standalone skills — only /research is exposed, and the skill router loads them on demand via Read. A GitHub Action tracks upstream releases weekly and opens PRs to update.

Optional:

Python 3 ≥ 3.9 — all runtime scripts are stdlib-only (urllib, json, sqlite3, re). No pip / uv install needed; any system Python works. macOS and most Linux distros ship this by default.
AlphaXiv MCP (via the claude.ai connection) — powers the second parallel search source in /research discover. Three retrieval tools (mcp__claude_ai_alphaXiv__agentic_paper_retrieval, full_text_papers_search, embedding_similarity_search) are invoked in parallel inside the AlphaXiv subagent. If not connected, discover falls back to S2-only (log warning, not blocking). Enable from your claude.ai account's MCP connectors.
OpenReview credentials — for fetching peer reviews, rebuttals, and meta-reviews of papers at venues using OpenReview (ICLR, NeurIPS, ICML, etc.). Register at openreview.net/profile, then add to .claude/settings.json:
```
{
  "env": {
    "OPENREVIEW_USER": "your-email",
    "OPENREVIEW_PASS": "your-password"
  }
}
```
Without these, all other features work normally — OpenReview integration is best-effort.

Project structure

vendor/ai-research-skills/    # Vendored from Orchestra-Research/AI-Research-SKILLs (v1.4.0)
  01-model-architecture/      # 21 domain categories, 85 skills total
  02-tokenization/
  ...
  21-research-ideation/
  .tracked-version            # Current vendored version tag
  LICENSE                     # MIT (upstream)
.github/
  workflows/
    update-ai-research-skills.yml  # Weekly upstream release tracker
  scripts/
    sync-skill-router.py           # Syncs router keywords from SKILL.md tags
    router-keyword-overrides.yml   # Manual keyword supplements
skills/research/
  SKILL.md                  # Orchestrator — intent detection + routing + unified input parsing
  phases/
    skill-router.md         # Central domain detection + skill routing (21 categories)
    discover.md             # Multi-source search + quick-read + landscape summary
    discuss.md              # 9-phase ideation engine with adversarial checks
    read.md                 # Deep structured analysis with domain expert perspective
    cite.md                 # DBLP > CrossRef > S2 BibTeX chain
    write.md                # Paper writing with Triple Review Gate + Consistency Check
  scripts/                  # 20 self-contained Python 3 scripts (stdlib only)
    _common.py              # Shared HTTP / rate-limit / DBLP-fallback / S2 formatter
    s2_search.py            # S2 relevance-ranked search
    s2_bulk_search.py       # S2 boolean bulk search with year filtering
    s2_batch.py             # S2 batch metadata (up to 500 IDs)
    s2_citations.py         # Citation graph traversal
    s2_references.py        # Reference graph traversal
    s2_recommend.py         # Paper recommendations
    s2_snippet.py           # Search within paper bodies
    s2_match.py             # Exact title match
    dblp_search.py          # DBLP publication search
    dblp_bibtex.py          # Title+author+year → condensed .bib via DBLP API
    arxiv_bibtex.py         # arXiv ID → @misc .bib from arxiv.org
    crossref_search.py      # CrossRef search
    doi2bibtex.py           # DOI → BibTeX via content negotiation
    cvf_bibtex.py           # CVPR/ICCV/WACV → .bib via CVF Open Access
    iclr_bibtex.py          # ICLR → .bib via OpenReview (v1 + v2 APIs)
    neurips_bibtex.py       # NeurIPS → .bib via papers.nips.cc + OpenReview fallback
    venue_info.py           # Venue quality summary (CCF + IF + quartile)
    ccf_lookup.py           # CCF ranking lookup
    if_lookup.py            # Impact factor lookup
    author_info.py          # Author h-index and stats
  data/
    ccf_2026.sqlite         # CCF rankings database (682 entries)
    ccf_2026.jsonl          # CCF rankings source
    impact_factor.sqlite3   # Impact factor database (19,727 journals)
tests/                        # Test suites (separate from skill)
  _common.py
  _bibtex.py                  # Shared positive/negative harness for BibTeX fetchers
  run_all_tests.py
  test_structure.py           # Structural validation (phase files, categories, migrations)
  test_s2_search.py
  test_s2_network.py
  test_s2_batch.py
  test_s2_snippet.py
  test_dblp.py
  test_crossref.py
  test_quality_eval.py
  test_cite_chain.py
  test_alphaxiv.py
  test_cvf_bibtex.py
  test_iclr_bibtex.py
  test_neurips_bibtex.py

Running tests

python3 tests/run_all_tests.py

Requires S2_API_KEY in .claude/settings.json env (see Installation). Tests hit live APIs (S2, DBLP, CrossRef, AlphaXiv).

Workspace

On first invocation, /research creates .research-workspace/ in the current directory. Each session persists discover results, discussion briefs, read analyses, and verified BibTeX — all as JSON for reuse across phases.

.research-workspace/
  state.json
  sessions/
    {topic-slug}-{date}/
      discover.json           # Search results + verdicts + landscape summary
      discuss/
        brief.json            # Research brief from discuss phase
      read/{paper_id}.json    # Structured paper analyses
      cite/{paper_id}.bib     # Verified BibTeX entries
      cite/cite-log.json      # Citation metadata and sources

Iron Rules

Zero hallucination citations — every citation from an API call, never from model memory
BibTeX priority — DBLP > CrossRef > S2
Exhaustive search escalation — when a retrieval task has no directly relevant results after the applicable primary searches, follow the Search Escalation Protocol before accepting "not found"
Quality gate — no paper presented without quality evaluation
Source tracing — every citation tagged with data source
Own model for analysis — never rely on AlphaXiv's AI-generated answers
Domain skill grounding — factual claims must trace to paper content, not skill-generated assertions
Adversarial before commitment — no direction finalized without novelty check
Multi-perspective review for framing — abstract/intro must pass reviewer, AC/SAC, and senior researcher perspectives + cross-model gate; related-work must pass coverage & fairness check
Simplicity preference — between two approaches of similar merit, prefer the simpler one
Verify before completion — run verification before claiming output is done
Root cause before retry — diagnose failures before retrying

Rate limits

Service	Limit	Strategy
Semantic Scholar	1 req/sec (with key)	Sequential + batch/bulk endpoints
DBLP	~1 req/sec	Sequential, 1s delay
CrossRef	50 req/sec	Polite pool
AlphaXiv MCP (3 tools) / AlphaXiv markdown	No strict limit	Respectful usage

License

MIT — Luke (Haopeng Chen)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.claude-plugin		.claude-plugin
.github		.github
evals		evals
skills/research		skills/research
tests		tests
vendor/ai-research-skills		vendor/ai-research-skills
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scholar-tools

What it does

Skill Router

How search works

How discussion works

How citations work

How writing works

Installation

Claude Code

Prerequisites

Project structure

Running tests

Workspace

Iron Rules

Rate limits

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scholar-tools

What it does

Skill Router

How search works

How discussion works

How citations work

How writing works

Installation

Claude Code

Prerequisites

Project structure

Running tests

Workspace

Iron Rules

Rate limits

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages