Skip to content

arsity/scholar-tools

Repository files navigation

scholar-tools

A Claude Code plugin for academic research with domain-aware skill routing. /research discover "topic" searches multiple databases in parallel, ranks papers by venue quality and citations, quick-reads the top results, and provides a landscape summary. /research discuss drives deep research ideation with adversarial novelty checks and reviewer simulation. /research cite generates verified BibTeX. Every citation traces to an API call, never to model memory.

Claude Code MIT License

What it does

Command What happens
/research discover "topic" S2 semantic search + AlphaXiv MCP retrieval (3 tools in parallel: agentic / full-text / embedding), deduplicate, quality-rank, quick-read top papers via AlphaXiv markdown, generate landscape summary
/research discuss Deep discussion: assumption surfacing, adversarial novelty check, reviewer simulation, significance test, experiment design
/research discuss <paper> Start discussion from a specific paper
/research read <paper> Deep structured analysis with domain expert perspective
/research cite <paper> Verified BibTeX via DBLP > CrossRef > S2 chain
/research write <section> Write a paper section with Triple Review Gate (abstract/intro) + Consistency Check (method/experiments/conclusion)

All commands support --domain <categories> (additive) or --domain-only <categories> (exclusive) to override auto-detected domain skills.

Skill Router

The central innovation: a Skill Router maps paper content to 21 domain skill categories from Orchestra-Research/AI-Research-SKILLs, injecting expert knowledge into each phase.

# Category Example Skills
1 Model-Architecture litgpt, mamba, nanogpt, rwkv, torchtitan
2 Tokenization huggingface-tokenizers, sentencepiece
3 Fine-Tuning axolotl, llama-factory, peft, unsloth
4 Mechanistic-Interpretability transformer-lens, saelens, pyvene, nnsight
5 Data-Processing ray-data, nemo-curator
6 Post-Training trl, grpo-rl-training, openrlhf, simpo, verl
7 Safety-Alignment constitutional-ai, llamaguard, nemo-guardrails
8 Distributed-Training megatron-core, deepspeed, pytorch-fsdp2, accelerate
9 Infrastructure modal, skypilot, lambda-labs
10 Optimization flash-attention, bitsandbytes, gptq, awq, hqq, gguf
11 Evaluation lm-evaluation-harness, bigcode-evaluation-harness
12 Inference-Serving vllm, tensorrt-llm, llama.cpp, sglang
13 MLOps weights-and-biases, mlflow, tensorboard
14 Agents langchain, llamaindex, crewai, autogpt
15 RAG chroma, faiss, sentence-transformers, pinecone, qdrant
16 Prompt-Engineering dspy, instructor, guidance, outlines
17 Observability langsmith, phoenix
18 Multimodal clip, whisper, llava, stable-diffusion, segment-anything
19 Emerging-Techniques moe-training, model-merging, long-context, speculative-decoding
20 ML-Paper-Writing ml-paper-writing (auto-invoked in write phase)
21 Research-Ideation brainstorming-research-ideas, creative-thinking-for-research

The router auto-detects relevant categories from paper keywords and classifies them as primary (core contribution) or secondary (peripheral tool).

How search works

Two agents run in parallel, each with a 60-second timeout:

  1. Semantic Scholar — semantic search across 200M+ papers (s2_search.py) + boolean bulk search (s2_bulk_search.py) with year filtering. Primary search source.
  2. AlphaXiv MCP (claude.ai-bound) — three retrieval tools fired in parallel inside the agent: agentic_paper_retrieval (NL/conceptual), full_text_papers_search (body keyword match), embedding_similarity_search (semantic neighbors). Papers surfaced by multiple tools get a stronger relevance signal for downstream ranking.

Results are deduplicated by arXiv ID / DOI / title similarity, then scored:

Signal Weight
CCF ranking (A/B/C) base score
JCR/CAS quartile base score
Impact factor 30%
Citation count (log-scaled) 20%
Recency 10%
First-author h-index 10%

arXiv-only papers with < 100 citations get a -20 penalty. Published versions are preferred.

After scoring, each top paper is quick-read via curl -sL https://alphaxiv.org/overview/{id}.md (or S2 abstract if not on arXiv) and receives a verdict: Must read / Worth reading / Skim / Skip. A landscape summary synthesizes key themes and trends.

How discussion works

The discuss phase is a 9-sub-phase ideation engine:

  1. Setup — load discover/read results, invoke skill router + research-ideation skills
  2. Assumption Surfacing — challenge inherited conventions in the field
  3. Discussion Loop — iterative analysis with auto knowledge gap filling and out-of-domain search
  4. Adversarial Novelty Check — verify proposed directions against existing literature
  5. Reviewer Simulation — generate specific reviewer objections with severity ratings
  6. Significance Test — 3-tier assessment (real-world impact, community impact, improvement magnitude)
  7. Simplicity Test — can the idea be explained in 2 sentences without jargon?
  8. Experiment Design — baselines, datasets, ablations, expected results, compute requirements
  9. Convergence Decision — direction comparison matrix backed by evidence

Output: a structured research brief (brief.json) that feeds into the write phase.

How citations work

Zero hallucination policy. Every BibTeX entry must trace to an API response. The chain:

1. DBLP search → title match (>90% token overlap)
   → If published venue: fetch condensed .bib                          → "via DBLP"
   → If arXiv-only (CoRR): fetch from arxiv.org instead               → "via arXiv"
2. CrossRef search → extract DOI → content negotiation                 → "via CrossRef"
3. S2 exact match → construct from metadata (or arXiv if available)    → "via S2 — verify manually"
4. All fail → "Citation source not verified. Not safe to cite."

Never generates BibTeX from model memory. Never fills in year/venue/authors from model knowledge.

How writing works

The write phase adds two quality mechanisms:

  • Triple Review Gate (abstract + introduction): Three perspectives (Reviewer, AC/SAC, Senior Researcher) each provide 2-3 specific revision suggestions
  • Consistency Check (method, experiments, conclusion): Cross-reference scan ensures contributions match experiments, claims match results, assumptions match setup

Installation

Claude Code

# Option 1: Install via marketplace
claude plugin marketplace add arsity/scholar-tools
claude plugin install scholar-tools@research

# Option 2: Manual install
git clone https://github.com/arsity/scholar-tools.git ~/.claude/plugins/scholar-tools

Prerequisites

Required:

  1. Semantic Scholar API key — get one at semanticscholar.org/product/api, then add to your project's .claude/settings.json:

    {
      "env": {
        "S2_API_KEY": "your-key-here"
      }
    }

    Claude Code automatically exports env entries as environment variables — scripts read $S2_API_KEY directly.

  2. Domain skills — 85 skills from Orchestra-Research/AI-Research-SKILLs are bundled in vendor/ai-research-skills/ (no separate installation needed). They are not registered as standalone skills — only /research is exposed, and the skill router loads them on demand via Read. A GitHub Action tracks upstream releases weekly and opens PRs to update.

Optional:

  1. Python 3 ≥ 3.9 — all runtime scripts are stdlib-only (urllib, json, sqlite3, re). No pip / uv install needed; any system Python works. macOS and most Linux distros ship this by default.

  2. AlphaXiv MCP (via the claude.ai connection) — powers the second parallel search source in /research discover. Three retrieval tools (mcp__claude_ai_alphaXiv__agentic_paper_retrieval, full_text_papers_search, embedding_similarity_search) are invoked in parallel inside the AlphaXiv subagent. If not connected, discover falls back to S2-only (log warning, not blocking). Enable from your claude.ai account's MCP connectors.

  3. OpenReview credentials — for fetching peer reviews, rebuttals, and meta-reviews of papers at venues using OpenReview (ICLR, NeurIPS, ICML, etc.). Register at openreview.net/profile, then add to .claude/settings.json:

    {
      "env": {
        "OPENREVIEW_USER": "your-email",
        "OPENREVIEW_PASS": "your-password"
      }
    }

    Without these, all other features work normally — OpenReview integration is best-effort.

Project structure

vendor/ai-research-skills/    # Vendored from Orchestra-Research/AI-Research-SKILLs (v1.4.0)
  01-model-architecture/      # 21 domain categories, 85 skills total
  02-tokenization/
  ...
  21-research-ideation/
  .tracked-version            # Current vendored version tag
  LICENSE                     # MIT (upstream)
.github/
  workflows/
    update-ai-research-skills.yml  # Weekly upstream release tracker
  scripts/
    sync-skill-router.py           # Syncs router keywords from SKILL.md tags
    router-keyword-overrides.yml   # Manual keyword supplements
skills/research/
  SKILL.md                  # Orchestrator — intent detection + routing + unified input parsing
  phases/
    skill-router.md         # Central domain detection + skill routing (21 categories)
    discover.md             # Multi-source search + quick-read + landscape summary
    discuss.md              # 9-phase ideation engine with adversarial checks
    read.md                 # Deep structured analysis with domain expert perspective
    cite.md                 # DBLP > CrossRef > S2 BibTeX chain
    write.md                # Paper writing with Triple Review Gate + Consistency Check
  scripts/                  # 20 self-contained Python 3 scripts (stdlib only)
    _common.py              # Shared HTTP / rate-limit / DBLP-fallback / S2 formatter
    s2_search.py            # S2 relevance-ranked search
    s2_bulk_search.py       # S2 boolean bulk search with year filtering
    s2_batch.py             # S2 batch metadata (up to 500 IDs)
    s2_citations.py         # Citation graph traversal
    s2_references.py        # Reference graph traversal
    s2_recommend.py         # Paper recommendations
    s2_snippet.py           # Search within paper bodies
    s2_match.py             # Exact title match
    dblp_search.py          # DBLP publication search
    dblp_bibtex.py          # Title+author+year → condensed .bib via DBLP API
    arxiv_bibtex.py         # arXiv ID → @misc .bib from arxiv.org
    crossref_search.py      # CrossRef search
    doi2bibtex.py           # DOI → BibTeX via content negotiation
    cvf_bibtex.py           # CVPR/ICCV/WACV → .bib via CVF Open Access
    iclr_bibtex.py          # ICLR → .bib via OpenReview (v1 + v2 APIs)
    neurips_bibtex.py       # NeurIPS → .bib via papers.nips.cc + OpenReview fallback
    venue_info.py           # Venue quality summary (CCF + IF + quartile)
    ccf_lookup.py           # CCF ranking lookup
    if_lookup.py            # Impact factor lookup
    author_info.py          # Author h-index and stats
  data/
    ccf_2026.sqlite         # CCF rankings database (682 entries)
    ccf_2026.jsonl          # CCF rankings source
    impact_factor.sqlite3   # Impact factor database (19,727 journals)
tests/                        # Test suites (separate from skill)
  _common.py
  _bibtex.py                  # Shared positive/negative harness for BibTeX fetchers
  run_all_tests.py
  test_structure.py           # Structural validation (phase files, categories, migrations)
  test_s2_search.py
  test_s2_network.py
  test_s2_batch.py
  test_s2_snippet.py
  test_dblp.py
  test_crossref.py
  test_quality_eval.py
  test_cite_chain.py
  test_alphaxiv.py
  test_cvf_bibtex.py
  test_iclr_bibtex.py
  test_neurips_bibtex.py

Running tests

python3 tests/run_all_tests.py

Requires S2_API_KEY in .claude/settings.json env (see Installation). Tests hit live APIs (S2, DBLP, CrossRef, AlphaXiv).

Workspace

On first invocation, /research creates .research-workspace/ in the current directory. Each session persists discover results, discussion briefs, read analyses, and verified BibTeX — all as JSON for reuse across phases.

.research-workspace/
  state.json
  sessions/
    {topic-slug}-{date}/
      discover.json           # Search results + verdicts + landscape summary
      discuss/
        brief.json            # Research brief from discuss phase
      read/{paper_id}.json    # Structured paper analyses
      cite/{paper_id}.bib     # Verified BibTeX entries
      cite/cite-log.json      # Citation metadata and sources

Iron Rules

  1. Zero hallucination citations — every citation from an API call, never from model memory
  2. BibTeX priority — DBLP > CrossRef > S2
  3. Exhaustive search escalation — when a retrieval task has no directly relevant results after the applicable primary searches, follow the Search Escalation Protocol before accepting "not found"
  4. Quality gate — no paper presented without quality evaluation
  5. Source tracing — every citation tagged with data source
  6. Own model for analysis — never rely on AlphaXiv's AI-generated answers
  7. Domain skill grounding — factual claims must trace to paper content, not skill-generated assertions
  8. Adversarial before commitment — no direction finalized without novelty check
  9. Multi-perspective review for framing — abstract/intro must pass reviewer, AC/SAC, and senior researcher perspectives + cross-model gate; related-work must pass coverage & fairness check
  10. Simplicity preference — between two approaches of similar merit, prefer the simpler one
  11. Verify before completion — run verification before claiming output is done
  12. Root cause before retry — diagnose failures before retrying

Rate limits

Service Limit Strategy
Semantic Scholar 1 req/sec (with key) Sequential + batch/bulk endpoints
DBLP ~1 req/sec Sequential, 1s delay
CrossRef 50 req/sec Polite pool
AlphaXiv MCP (3 tools) / AlphaXiv markdown No strict limit Respectful usage

License

MIT — Luke (Haopeng Chen)

About

Claude Code skill for academic research lifecycle: literature discovery (Semantic Scholar, HF), verified BibTeX generation (DBLP > CrossRef > DOI), citation network analysis, quality evaluation (CCF/IF), paper discussion, and writing assistance.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages