IndicationScout

An agentic system for discovering drug repurposing opportunities.

Overview

IndicationScout is an agentic drug repurposing system. A drug name goes in; coordinated AI agents query multiple biomedical data sources and produce a repurposing report.

IndicationScout pulls live evidence from multiple biomedical databases at query time — no static knowledge base, no precomputed answers. For each candidate indication it gathers current trial activity, recent literature, mechanistic associations, and regulatory status, then synthesizes them into a single repurposing report that characterizes the state of the hypothesis (live, stalled, niche, untested, post-readout-and-stuck) per disease.

A Supervisor agent orchestrates three specialist sub-agents:

Literature agent — Queries PubMed via EUtils, then runs a RAG pipeline: fetch abstracts, embed with BioLORD-2023, semantic search, and LLM-based synthesis of evidence for each candidate disease.
Clinical Trials agent — Queries ClinicalTrials.gov v2 (REST) per drug × indication pair: all-status search (whitespace verdict + per-status counts), completed-trial query (with Phase 3 count), terminated-trial query (with why_stopped text classified into safety / efficacy / business / enrollment categories at the tool layer), competitive landscape for the indication, and an FDA-label approval check used to short-circuit when the drug is already approved for the candidate. Indications are resolved to a MeSH descriptor via NCBI E-utilities and the resolved preferred term is fed to CT.gov's server-side AREA[ConditionMeshTerm]"<term>" filter — so e.g. "hypertension" trials aren't mixed in with glaucoma/portal/pulmonary hypertension.
Mechanism agent — Queries Open Targets target-level data (GraphQL) to retrieve disease associations with evidence scores and Reactome pathway annotations.

The Supervisor first calls find_candidates (Open Targets competitor analysis) and analyze_mechanism in parallel, then delegates to the Literature and Clinical Trials agents per candidate disease.

Top-5 candidate blurbs

After investigating candidates, the Supervisor produces a ranked Summary. For each of the top 5 ranked candidates, the Supervisor also writes a 2-sentence interpretive blurb characterizing the state of the hypothesis — live, stalled, niche, untested, post-readout-and-stuck, etc. — grounded in the literature and clinical-trials sub-agent summaries seen during that run. Mechanism content is intentionally excluded from the blurb to keep it focused on clinical evidence. Blurbs are produced by the same finalize_supervisor tool call that emits the ranked summary (no extra LLM call) and rendered inline under each disease in the Summary section of both the Markdown report and the Streamlit Overview tab.

Data sources:

Open Targets (GraphQL) — drug targets, disease associations, competitor drugs
ClinicalTrials.gov (REST v2) — trial search, whitespace detection, competitive landscape
PubMed (NCBI EUtils) — literature retrieval and abstract indexing
NCBI MeSH (E-utilities) — indication → MeSH D-number resolution for clinical-trials post-filtering
ChEMBL — molecule metadata and ATC classifications
openFDA — drug labels for FDA-approval extraction

Installation

Clone the Repository

git clone https://github.com/dgupta/IndicationScout.git
cd IndicationScout/indication_scout

Install Dependencies

pip install -e .

For development:

pip install -e ".[dev]"

Environment Setup

The app loads two env files in order: .env (secrets, DB credentials, API keys, model names) and .env.constants (tunable numeric limits — top-k, timeouts, batch sizes, etc.). .env.constants has no defaults; if a field is missing the app fails to start. Both files must exist before running the CLI or API.

Copy .env.example to .env and fill in your API keys. A checked-in .env.constants provides the tunable numeric limits — leave it as-is unless you have a reason to change a limit.

cp .env.example .env

To swap the constants file at runtime (e.g. for tests or experiments):

CONSTANTS_FILE=.env.constants.test pytest
CONSTANTS_FILE=.env.constants.experiment scout find -d "metformin"

Required environment variables:

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string (e.g. `postgresql+psycopg2://scout:scout@localhost:5438/scout`)
`DB_PASSWORD`	Yes	Database password
`TEST_DATABASE_URL`	No	Separate PostgreSQL URL for integration tests (e.g. `postgresql+psycopg2://scout:scout@localhost:5438/scout_test`)
`ANTHROPIC_API_KEY`	Yes	Anthropic API key for Claude LLM calls
`NCBI_API_KEY`	No	NCBI API key for PubMed (increases rate limits)
`PUBMED_API_KEY`	No	PubMed API key (separate from NCBI key in config)
`OPENAI_API_KEY`	No	OpenAI API key
`OPENFDA_API_KEY`	No	OpenFDA API key
`LLM_MODEL`	No	Primary LLM model (default: `claude-sonnet-4-6`)
`SMALL_LLM_MODEL`	No	Lightweight LLM model (default: `claude-haiku-4-5-20251001`)
`EMBEDDING_MODEL`	No	Embedding model (default: `FremyCompany/BioLORD-2023`)

The project requires a PostgreSQL database with the pgvector extension for storing PubMed abstract embeddings. See docs/rag.md for the Docker setup.

Database Setup

Start the database container:

docker compose up -d

Then apply migrations to both the main and test databases:

# Main database
alembic upgrade head

# Test database (used by integration tests)
DATABASE_URL=postgresql+psycopg2://scout:<password>@localhost:5438/scout_test alembic upgrade head

Replace <password> with the value of DB_PASSWORD from your .env file.

Note: After deleting Docker volumes (docker compose down -v), you must re-run both migration commands — the schema is not persisted.

Usage

CLI

scout find -d "metformin"                          # writes <drug>_<timestamp>.md to ./snapshots and <drug>_<timestamp>.json to ./test_reports
scout find -d "metformin" --out-dir reports/       # custom markdown output directory
scout find -d "metformin" --no-write               # print the markdown report to stdout (JSON payload is still saved to ./test_reports)
scout --help

Streamlit UI

streamlit run app.py

Single-page app: enter a drug name, run the supervisor agent, download the Markdown report.

API

uvicorn indication_scout.api.main:app --reload

Development

Running Tests

# All tests (regression + live tests are excluded by default)
pytest

# Integration tests only
pytest tests/integration/

# Unit tests only
pytest tests/unit/

Code Formatting & Linting

black src/ tests/
ruff check src/ tests/              # lint
ruff check --fix src/ tests/        # lint + autofix
mypy src/                           # type checking

Project Structure

src/indication_scout/
├── agents/          # AI agents
│   ├── base.py             # BaseAgent abstract class (currently unused by ReAct agents)
│   ├── _trial_formatting.py # Shared trial formatting helpers
│   ├── supervisor/         # Supervisor agent (orchestrates sub-agents) — supervisor_agent.py, supervisor_tools.py, supervisor_output.py
│   ├── literature/         # Literature agent (PubMed RAG) — literature_agent.py, literature_tools.py, literature_output.py
│   ├── clinical_trials/    # Clinical Trials agent (ClinicalTrials.gov + MeSH post-filter) — clinical_trials_agent.py, clinical_trials_tools.py, clinical_trials_output.py
│   └── mechanism/          # Mechanism agent (Open Targets targets) — mechanism_agent.py, mechanism_tools.py, mechanism_output.py, mechanism_candidates.py, mechanism_row_builder.py
├── api/             # FastAPI application (main.py, routes/, schemas/) -- /health endpoint only
├── cli/             # Click-based CLI (cli.py) — exposes the `scout` command
├── data_sources/    # Async API clients (OpenTargets, ClinicalTrials.gov, PubMed, ChEMBL, FDA, DrugBank stub)
├── db/              # SQLAlchemy session factory and declarative base
├── helpers/         # Utility functions (drug name normalization)
├── markers.py       # Code review exclusion markers (@no_review decorator)
├── ml_models/       # ML modeling code
│   ├── trial_risk/         # Trial-risk model (data.py, features.py, literature.py, score.py, train.py, inspect.py)
│   └── success_classifier/ # Trial-success classifier (features.py, labels.py)
├── models/          # Pydantic data contracts (model_open_targets, model_clinical_trials, model_pubmed_abstract, model_chembl, model_drug_profile, model_evidence_summary)
├── prompts/         # LLM prompt templates (supervisor, literature, clinical_trials, synthesize, expand_search_terms, extract_fda_approvals, extract_fda_approval_single, list_label_indications, extract_organ_term, merge_diseases, normalize_disease, normalize_disease_batch)
├── report/          # Report formatting (format_report.py) — turns SupervisorOutput into the final markdown report
├── runners/         # Pipeline runners (rag_runner.py) and exploration scripts (pubmed_runner.py); wandb/ logs
├── services/        # Business logic -- LLM calls (llm.py, including parse_llm_response), embeddings (embeddings.py), disease normalization + MeSH resolver (disease_helper.py: llm_normalize_disease, normalize_for_pubmed, resolve_mesh_id), PubMed query building (pubmed_query.py), FDA approval extraction (approval_check.py), RAG pipeline (retrieval.py -- fetch_and_cache, semantic_search, synthesize)
├── sqlalchemy/      # SQLAlchemy ORM models (pubmed_abstracts with pgvector embedding)
├── utils/           # Shared file-based cache utility (cache_key, cache_get, cache_set)
├── config.py        # Settings via pydantic-settings, loaded from .env
└── constants.py     # URLs, timeouts, lookup maps

Known Limitations

Abstract-only indexing: PubMed articles without an abstract (letters, editorials, conference summaries) are excluded from the vector store and will not appear in semantic search results. Only articles with a non-empty abstract are embedded and cached.
Incomplete Open Targets approval data: Open Targets does not record all approved indications for every drug. Approved indications missing from Open Targets will not be filtered from repurposing candidates and may appear as false positives. For example, tofacitinib's ulcerative colitis and ankylosing spondylitis approvals are absent from Open Targets, causing them to appear as repurposing candidates.

Citations

Open Targets: Ochoa, D. et al. (2023). The next-generation Open Targets Platform: reimagined, redesigned, rebuilt. Nucleic Acids Research, 51(D1), D1353–D1359. DOI: 10.1093/nar/gkac1037.

Acknowledgments

This codebase was created with the help of Claude Code.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude		.claude
alembic		alembic
docker/initdb		docker/initdb
docs		docs
models		models
notebooks		notebooks
scripts		scripts
skills		skills
src/indication_scout		src/indication_scout
static		static
tests		tests
.env.constants		.env.constants
.env.constants.custom		.env.constants.custom
.env.constants.integration		.env.constants.integration
.env.constants.test		.env.constants.test
.env.example		.env.example
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
alembic.ini		alembic.ini
app.py		app.py
docker-compose.yml		docker-compose.yml
drug_approvals.json		drug_approvals.json
future.md		future.md
holdout.md		holdout.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IndicationScout

Overview

Top-5 candidate blurbs

Installation

Clone the Repository

Install Dependencies

Environment Setup

Database Setup

Usage

CLI

Streamlit UI

API

Development

Running Tests

Code Formatting & Linting

Project Structure

Known Limitations

Citations

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IndicationScout

Overview

Top-5 candidate blurbs

Installation

Clone the Repository

Install Dependencies

Environment Setup

Database Setup

Usage

CLI

Streamlit UI

API

Development

Running Tests

Code Formatting & Linting

Project Structure

Known Limitations

Citations

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages