This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
AISysRevCmdLine is a command-line tool for automated systematic literature review (SLR) screening using LLMs via the OpenRouter API. It screens academic papers (title + abstract) against inclusion/exclusion criteria and returns structured decisions. Command-line equivalent of the web-based AISysRev.
# Run tests
pytest
# Download the SESR-Eval benchmark dataset from Zenodo
python download_sesr.py [-o data/] [--no-extract] [--no-verify]
# Download the SYNERGY benchmark datasets from OpenAlex
python download_synergy.py [DATASET|all] [--list] [--data-dir data/synergy] [--refresh]
# Convert bibliographic files (.bib / PubMed .txt) to CSV
python bib2csv.py <file1> [file2 ...] [-o output.csv]
# Run the main screening tool
python screen.py <csv_file> -n <count|all> -c criteria.conf -m models.conf
# Run per-criterion boolean screening
python screen_boolean.py <csv_file> -n <count|all> -c criteria_screen_boolean.yml -m models.conf
# Run classification (probability-based)
python classify.py <csv_file> -n all -p 0.5
# Run single-choice classification
python classify_single.py <csv_file> -n 10
# Generate embeddings
python embed.py <csv_file> -n all -p 0.7
# Plot embeddings (UMAP + Plotly)
python plot.py <csv_file> -cPython environment: conda, or venv. Python version: 3.14.
Entry points — each is a standalone CLI script using argparse:
download_sesr.py— Downloads the SESR-Eval benchmark dataset from Zenodo. Extracts onlydata/3-processed-data/, writescriteria.confandinstructions.txtinto each study subfolder fromsecondary_study_data.csv, copies the authoritative primary study CSV toprimary_correct.csv, and removes the originalprimary_study_data*.csvfiles.download_synergy.py— Downloads SYNERGY benchmark datasets from OpenAlex. Fetchesdatasets.tomlfrom GitHub (cached at~/.cache/synergy_dataset/datasets.toml) for eligibility criteria; downloads paper CSVs via thesynergy-datasetpackage installed in an isolated venv (~/.synergy_dataset_env) to avoid polluting the project environment. Writesdata/synergy/{key}/primary_correct.csvanddata/synergy/{key}/criteria.conffor each dataset.bib2csv.py— Bibliographic format converter. Converts WOS/Scopus.bibfiles and PubMed MEDLINE.txtexports to a CSV ready for screening. Multiple files can be merged in one call. Output columns:title, abstract, doi, year, authors, journal, keywords, source_db, entry_key.screen.py— Main screening pipeline. Sends papers to LLMs with criteria, collects structured JSON responses, flattens them into CSV columns.screen_boolean.py— Per-criterion screening pipeline. Sends each criterion as a separate LLM call, combines per-criterion probabilities using fuzzy boolean logic (AND=MIN, OR=MAX, NOT=1−p) over a criteria tree defined in YAML.classify.py— Multi-class probability classification.classify_single.py— Single best-fit class assignment using dynamic Pydantic models.embed.py— Generate vector embeddings via OpenRouter/embeddingsendpoint.plot.py— UMAP dimensionality reduction + Plotly interactive HTML visualization.
bib2csv.py (bibliographic import):
- Auto-detect format per file:
.bib→parse_bibtex()(usesbibtexparser);.txtstarting withPMID→parse_pubmed()(custom MEDLINE line parser) - BibTeX: field lookup is case-insensitive; WOS double-braces
{{value}}and Scopus single-braces{value}both handled;source_dbinferred from cite-key prefix (WOS:) orsourcefield - PubMed: tag regex
^([A-Z][A-Z0-9]{1,3})\s*-\s(.+)$; continuation lines require exactly 6 leading spaces; multi-valued tags (FAU, OT, LID) collected; DOI extracted fromLIDlines ending with[doi] - Merge all records into one DataFrame → save CSV
Shared utilities:
helpers.py— CSV validation (auto-detects delimiter, normalizes headers), API key loading (~/openrouter.key), model config loading, unique filename generation.async_api.py— Async API infrastructure with two parallel stacks:- pydantic_ai Agent stack (httpx-based, for structured outputs):
screen.py,screen_boolean.py:process_all_models_agent→create_agent+process_batch_agent→_call_agent→ httpx clientclassify.py,classify_single.py:create_agent→process_batch_agent→_call_agent→ httpx client
- aiohttp stack (for
/embeddingsendpoint):embed.py:process_batch_aiohttp→retry_aiohttp_call+make_openrouter_headers
- pydantic_ai Agent stack (httpx-based, for structured outputs):
Core data flows:
screen.py (screening task):
validate_csv()→generate_prompts()using criteria.conf + prompts/prompt_screen.txtprocess_all_models_agent()— async concurrent API calls with semaphore-limited concurrency per model- Parse structured JSON →
flatten_nested_json()into flat CSV columns add_average_probability()→ save enriched CSV
classify.py (multi-class probability classification):
validate_csv()→ load criteria_classify.yml → generate classification promptscreate_agent()→process_batch_agent()— async calls for probability distribution across classes- Parse structured JSON with class probabilities → flatten into CSV columns
- Save enriched CSV with probability columns for each class
classify_single.py (single best-fit classification):
validate_csv()→ load criteria_classify.yml → dynamically create Pydantic models for each taxonomycreate_agent()→process_batch_agent()— async calls for single-choice classification- Parse structured JSON with single class assignment → flatten into CSV columns
- Save enriched CSV with selected class labels
screen_boolean.py (per-criterion boolean screening):
validate_csv()→ loadcriteria_screen_boolean.yml(boolean criteria tree) →generate_prompts()usingprompts/prompt_screen_boolean.txt(one prompt per paper per criterion)process_all_models_agent()— one async call per paper per criterionfuzzy_eval()— apply fuzzy boolean logic across criteria tree per paper- Compute
(inclusion_prob, exclusion_prob, overall_prob, binary_decision)→ save enriched CSV
embed.py (vector embeddings):
validate_csv()→ load embed.conf → generate embedding textsprocess_batch_aiohttp()withretry_aiohttp_call()— direct aiohttp POST to/embeddingsendpoint- Extract vector embeddings → add as columns to CSV
- Save enriched CSV with embedding dimensions as columns
plot.py (visualization):
- Read CSV with embeddings → extract embedding columns
- Apply UMAP dimensionality reduction to 2D/3D
- Create interactive Plotly scatter plot (colored by classification/screening results)
- Save as standalone HTML file
Structured output enforcement: Pydantic models (StructuredResponse, Decision, Criterion) in entry scripts force LLMs to return valid JSON via response_format: json_schema. Criterion names are regex-normalized to IC1, IC2, EC1, EC2 format.
Retry/error handling:
- pydantic_ai stack uses
AsyncTenacityTransport(httpx with tenacity) for automatic retry on 429, 502-504 with exponential backoff + jitter. - aiohttp stack uses manual retry logic in
retry_aiohttp_call()with similar backoff strategy. - Both stacks abort on permanent errors (400, 401, 403, 404, 405, 422) and deduplicate error messages across batch.
User-customizable .conf and .yml files (gitignored — copy from .example files):
models.conf— One OpenRouter model ID per linecriteria.conf— Inclusion/exclusion criteria textcriteria_classify.yml— YAML classification taxonomiescriteria_screen_boolean.yml— Boolean criteria tree forscreen_boolean.py(tracked in git as a working example)embed.conf— Text prefix for embeddings
Prompt templates (tracked in git, not meant to be edited by users):
prompts/prompt_screen.txt— Template forscreen.pywith{0}-{3}placeholdersprompts/prompt_classify.txt— Template forclassify.pyprompts/prompt_classify_single.txt— Template forclassify_single.pyprompts/prompt_generate_classes.txt— Template forgenerate_classes.pyprompts/prompt_screen_boolean.txt— Template forscreen_boolean.py
Uses OpenRouter API (https://openrouter.ai/api/v1/) with temperature: 0, top_p: 0.1, and response_format: json_schema for deterministic structured output. API key read from ~/openrouter.key.
Tests are in tests/ with test fixtures in tests/test_csv_files/ (CSV edge cases) and tests/bibs/ (bibliographic files).
tests/test_csv_reader.py— CSV validation edge cases (delimiter detection, missing/duplicate columns, header normalization).tests/test_bib2csv.py— Full test suite forbib2csv.py: entry counts for all 8 real bib files,source_dbcorrectness, no-Nonevalues, and per-fault assertions for all 15 fault categories injected intests/bibs/wos_faults.bib,tests/bibs/scopus_faults.bib, andtests/bibs/pubmed_faults.txt.
Integration testing of screening/classification scripts is done by running commands against OpenRouter with default settings (typically first 10 papers only, keeping costs low).