A Claude Code plugin for academic research with domain-aware skill routing. /research discover "topic" searches multiple databases in parallel, ranks papers by venue quality and citations, quick-reads the top results, and provides a landscape summary. /research discuss drives deep research ideation with adversarial novelty checks and reviewer simulation. /research cite generates verified BibTeX. Every citation traces to an API call, never to model memory.
| Command | What happens |
|---|---|
/research discover "topic" |
S2 semantic search + AlphaXiv MCP retrieval (3 tools in parallel: agentic / full-text / embedding), deduplicate, quality-rank, quick-read top papers via AlphaXiv markdown, generate landscape summary |
/research discuss |
Deep discussion: assumption surfacing, adversarial novelty check, reviewer simulation, significance test, experiment design |
/research discuss <paper> |
Start discussion from a specific paper |
/research read <paper> |
Deep structured analysis with domain expert perspective |
/research cite <paper> |
Verified BibTeX via DBLP > CrossRef > S2 chain |
/research write <section> |
Write a paper section with Triple Review Gate (abstract/intro) + Consistency Check (method/experiments/conclusion) |
All commands support --domain <categories> (additive) or --domain-only <categories> (exclusive) to override auto-detected domain skills.
The central innovation: a Skill Router maps paper content to 21 domain skill categories from Orchestra-Research/AI-Research-SKILLs, injecting expert knowledge into each phase.
| # | Category | Example Skills |
|---|---|---|
| 1 | Model-Architecture | litgpt, mamba, nanogpt, rwkv, torchtitan |
| 2 | Tokenization | huggingface-tokenizers, sentencepiece |
| 3 | Fine-Tuning | axolotl, llama-factory, peft, unsloth |
| 4 | Mechanistic-Interpretability | transformer-lens, saelens, pyvene, nnsight |
| 5 | Data-Processing | ray-data, nemo-curator |
| 6 | Post-Training | trl, grpo-rl-training, openrlhf, simpo, verl |
| 7 | Safety-Alignment | constitutional-ai, llamaguard, nemo-guardrails |
| 8 | Distributed-Training | megatron-core, deepspeed, pytorch-fsdp2, accelerate |
| 9 | Infrastructure | modal, skypilot, lambda-labs |
| 10 | Optimization | flash-attention, bitsandbytes, gptq, awq, hqq, gguf |
| 11 | Evaluation | lm-evaluation-harness, bigcode-evaluation-harness |
| 12 | Inference-Serving | vllm, tensorrt-llm, llama.cpp, sglang |
| 13 | MLOps | weights-and-biases, mlflow, tensorboard |
| 14 | Agents | langchain, llamaindex, crewai, autogpt |
| 15 | RAG | chroma, faiss, sentence-transformers, pinecone, qdrant |
| 16 | Prompt-Engineering | dspy, instructor, guidance, outlines |
| 17 | Observability | langsmith, phoenix |
| 18 | Multimodal | clip, whisper, llava, stable-diffusion, segment-anything |
| 19 | Emerging-Techniques | moe-training, model-merging, long-context, speculative-decoding |
| 20 | ML-Paper-Writing | ml-paper-writing (auto-invoked in write phase) |
| 21 | Research-Ideation | brainstorming-research-ideas, creative-thinking-for-research |
The router auto-detects relevant categories from paper keywords and classifies them as primary (core contribution) or secondary (peripheral tool).
Two agents run in parallel, each with a 60-second timeout:
- Semantic Scholar — semantic search across 200M+ papers (
s2_search.py) + boolean bulk search (s2_bulk_search.py) with year filtering. Primary search source. - AlphaXiv MCP (claude.ai-bound) — three retrieval tools fired in parallel inside the agent:
agentic_paper_retrieval(NL/conceptual),full_text_papers_search(body keyword match),embedding_similarity_search(semantic neighbors). Papers surfaced by multiple tools get a stronger relevance signal for downstream ranking.
Results are deduplicated by arXiv ID / DOI / title similarity, then scored:
| Signal | Weight |
|---|---|
| CCF ranking (A/B/C) | base score |
| JCR/CAS quartile | base score |
| Impact factor | 30% |
| Citation count (log-scaled) | 20% |
| Recency | 10% |
| First-author h-index | 10% |
arXiv-only papers with < 100 citations get a -20 penalty. Published versions are preferred.
After scoring, each top paper is quick-read via curl -sL https://alphaxiv.org/overview/{id}.md (or S2 abstract if not on arXiv) and receives a verdict: Must read / Worth reading / Skim / Skip. A landscape summary synthesizes key themes and trends.
The discuss phase is a 9-sub-phase ideation engine:
- Setup — load discover/read results, invoke skill router + research-ideation skills
- Assumption Surfacing — challenge inherited conventions in the field
- Discussion Loop — iterative analysis with auto knowledge gap filling and out-of-domain search
- Adversarial Novelty Check — verify proposed directions against existing literature
- Reviewer Simulation — generate specific reviewer objections with severity ratings
- Significance Test — 3-tier assessment (real-world impact, community impact, improvement magnitude)
- Simplicity Test — can the idea be explained in 2 sentences without jargon?
- Experiment Design — baselines, datasets, ablations, expected results, compute requirements
- Convergence Decision — direction comparison matrix backed by evidence
Output: a structured research brief (brief.json) that feeds into the write phase.
Zero hallucination policy. Every BibTeX entry must trace to an API response. The chain:
1. DBLP search → title match (>90% token overlap)
→ If published venue: fetch condensed .bib → "via DBLP"
→ If arXiv-only (CoRR): fetch from arxiv.org instead → "via arXiv"
2. CrossRef search → extract DOI → content negotiation → "via CrossRef"
3. S2 exact match → construct from metadata (or arXiv if available) → "via S2 — verify manually"
4. All fail → "Citation source not verified. Not safe to cite."
Never generates BibTeX from model memory. Never fills in year/venue/authors from model knowledge.
The write phase adds two quality mechanisms:
- Triple Review Gate (abstract + introduction): Three perspectives (Reviewer, AC/SAC, Senior Researcher) each provide 2-3 specific revision suggestions
- Consistency Check (method, experiments, conclusion): Cross-reference scan ensures contributions match experiments, claims match results, assumptions match setup
# Option 1: Install via marketplace
claude plugin marketplace add arsity/scholar-tools
claude plugin install scholar-tools@research
# Option 2: Manual install
git clone https://github.com/arsity/scholar-tools.git ~/.claude/plugins/scholar-toolsRequired:
-
Semantic Scholar API key — get one at semanticscholar.org/product/api, then add to your project's
.claude/settings.json:{ "env": { "S2_API_KEY": "your-key-here" } }Claude Code automatically exports
enventries as environment variables — scripts read$S2_API_KEYdirectly. -
Domain skills — 85 skills from Orchestra-Research/AI-Research-SKILLs are bundled in
vendor/ai-research-skills/(no separate installation needed). They are not registered as standalone skills — only/researchis exposed, and the skill router loads them on demand via Read. A GitHub Action tracks upstream releases weekly and opens PRs to update.
Optional:
-
Python 3 ≥ 3.9 — all runtime scripts are stdlib-only (
urllib,json,sqlite3,re). Nopip/uvinstall needed; any system Python works. macOS and most Linux distros ship this by default. -
AlphaXiv MCP (via the claude.ai connection) — powers the second parallel search source in
/research discover. Three retrieval tools (mcp__claude_ai_alphaXiv__agentic_paper_retrieval,full_text_papers_search,embedding_similarity_search) are invoked in parallel inside the AlphaXiv subagent. If not connected, discover falls back to S2-only (log warning, not blocking). Enable from your claude.ai account's MCP connectors. -
OpenReview credentials — for fetching peer reviews, rebuttals, and meta-reviews of papers at venues using OpenReview (ICLR, NeurIPS, ICML, etc.). Register at openreview.net/profile, then add to
.claude/settings.json:{ "env": { "OPENREVIEW_USER": "your-email", "OPENREVIEW_PASS": "your-password" } }Without these, all other features work normally — OpenReview integration is best-effort.
vendor/ai-research-skills/ # Vendored from Orchestra-Research/AI-Research-SKILLs (v1.4.0)
01-model-architecture/ # 21 domain categories, 85 skills total
02-tokenization/
...
21-research-ideation/
.tracked-version # Current vendored version tag
LICENSE # MIT (upstream)
.github/
workflows/
update-ai-research-skills.yml # Weekly upstream release tracker
scripts/
sync-skill-router.py # Syncs router keywords from SKILL.md tags
router-keyword-overrides.yml # Manual keyword supplements
skills/research/
SKILL.md # Orchestrator — intent detection + routing + unified input parsing
phases/
skill-router.md # Central domain detection + skill routing (21 categories)
discover.md # Multi-source search + quick-read + landscape summary
discuss.md # 9-phase ideation engine with adversarial checks
read.md # Deep structured analysis with domain expert perspective
cite.md # DBLP > CrossRef > S2 BibTeX chain
write.md # Paper writing with Triple Review Gate + Consistency Check
scripts/ # 20 self-contained Python 3 scripts (stdlib only)
_common.py # Shared HTTP / rate-limit / DBLP-fallback / S2 formatter
s2_search.py # S2 relevance-ranked search
s2_bulk_search.py # S2 boolean bulk search with year filtering
s2_batch.py # S2 batch metadata (up to 500 IDs)
s2_citations.py # Citation graph traversal
s2_references.py # Reference graph traversal
s2_recommend.py # Paper recommendations
s2_snippet.py # Search within paper bodies
s2_match.py # Exact title match
dblp_search.py # DBLP publication search
dblp_bibtex.py # Title+author+year → condensed .bib via DBLP API
arxiv_bibtex.py # arXiv ID → @misc .bib from arxiv.org
crossref_search.py # CrossRef search
doi2bibtex.py # DOI → BibTeX via content negotiation
cvf_bibtex.py # CVPR/ICCV/WACV → .bib via CVF Open Access
iclr_bibtex.py # ICLR → .bib via OpenReview (v1 + v2 APIs)
neurips_bibtex.py # NeurIPS → .bib via papers.nips.cc + OpenReview fallback
venue_info.py # Venue quality summary (CCF + IF + quartile)
ccf_lookup.py # CCF ranking lookup
if_lookup.py # Impact factor lookup
author_info.py # Author h-index and stats
data/
ccf_2026.sqlite # CCF rankings database (682 entries)
ccf_2026.jsonl # CCF rankings source
impact_factor.sqlite3 # Impact factor database (19,727 journals)
tests/ # Test suites (separate from skill)
_common.py
_bibtex.py # Shared positive/negative harness for BibTeX fetchers
run_all_tests.py
test_structure.py # Structural validation (phase files, categories, migrations)
test_s2_search.py
test_s2_network.py
test_s2_batch.py
test_s2_snippet.py
test_dblp.py
test_crossref.py
test_quality_eval.py
test_cite_chain.py
test_alphaxiv.py
test_cvf_bibtex.py
test_iclr_bibtex.py
test_neurips_bibtex.py
python3 tests/run_all_tests.pyRequires S2_API_KEY in .claude/settings.json env (see Installation). Tests hit live APIs (S2, DBLP, CrossRef, AlphaXiv).
On first invocation, /research creates .research-workspace/ in the current directory. Each session persists discover results, discussion briefs, read analyses, and verified BibTeX — all as JSON for reuse across phases.
.research-workspace/
state.json
sessions/
{topic-slug}-{date}/
discover.json # Search results + verdicts + landscape summary
discuss/
brief.json # Research brief from discuss phase
read/{paper_id}.json # Structured paper analyses
cite/{paper_id}.bib # Verified BibTeX entries
cite/cite-log.json # Citation metadata and sources
- Zero hallucination citations — every citation from an API call, never from model memory
- BibTeX priority — DBLP > CrossRef > S2
- Exhaustive search escalation — when a retrieval task has no directly relevant results after the applicable primary searches, follow the Search Escalation Protocol before accepting "not found"
- Quality gate — no paper presented without quality evaluation
- Source tracing — every citation tagged with data source
- Own model for analysis — never rely on AlphaXiv's AI-generated answers
- Domain skill grounding — factual claims must trace to paper content, not skill-generated assertions
- Adversarial before commitment — no direction finalized without novelty check
- Multi-perspective review for framing — abstract/intro must pass reviewer, AC/SAC, and senior researcher perspectives + cross-model gate; related-work must pass coverage & fairness check
- Simplicity preference — between two approaches of similar merit, prefer the simpler one
- Verify before completion — run verification before claiming output is done
- Root cause before retry — diagnose failures before retrying
| Service | Limit | Strategy |
|---|---|---|
| Semantic Scholar | 1 req/sec (with key) | Sequential + batch/bulk endpoints |
| DBLP | ~1 req/sec | Sequential, 1s delay |
| CrossRef | 50 req/sec | Polite pool |
| AlphaXiv MCP (3 tools) / AlphaXiv markdown | No strict limit | Respectful usage |
MIT — Luke (Haopeng Chen)