๐งฌ Bio-Agent
Bio-Agent is a lightweight research agent for biomedical and structural biology literature mining.
It automatically fetches recent PubMed papers, parses and stores them locally, and exports structured Markdown reports โ without using any LLMs or external APIs.
Current focus: Protein Structure Prediction (e.g. AlphaFold and related methods)
โจ Features
๐ Automated PubMed search (time window + keyword based)
๐ XML parsing into structured paper fields
๐๏ธ Local SQLite storage (incremental & reproducible)
๐ Markdown report export
๐ง Heuristic-based โKey Takeawaysโ digest (no LLM)
๐ No API keys, no cloud dependency
๐ฆ Installation Requirements
Python 3.8+
Windows / macOS / Linux
Option 1: Development / Local Usage (Recommended) git clone https://github.com/GeYugong/bio-agent.git cd bio-agent
python -m venv .venv
..venv\Scripts\Activate.ps1
source .venv/bin/activate
pip install -e .
Verify installation:
bio-agent --help bio-agent hello
Option 2: Standard Installation pip install .
๐ Quick Start
1๏ธโฃ Fetch Recent PubMed Papers
bio-agent fetch
--query "protein structure prediction OR AlphaFold"
--days 30
--retmax 20
This will:
Retrieve recent PMIDs from PubMed
Download and parse XML records
Store results in a local SQLite database (bio_agent.db)
2๏ธโฃ Export Markdown Reports bio-agent export --limit 10 --out report.md
Filter by keyword:
bio-agent export
--limit 20
--query-contains AlphaFold
--out alphafold_report.md
3๏ธโฃ Generate README Digest (Key Takeaways) bio-agent digest
What this does:
Reads recent papers from SQLite
Generates 3 key takeaways using heuristic scoring
Injects the digest into the top of README.md
Uses markers to avoid duplicate insertion
๐ Project Structure
bio-agent/
โโโ src/
โ โโโ bio_agent/
โ โโโ cli.py # Typer-based CLI entrypoint
โ โโโ pubmed.py # PubMed E-utilities + XML parsing
โ โโโ store.py # SQLite schema & upsert logic
โ โโโ exporter.py # Markdown / README export
โ โโโ digest.py # Heuristic digest generation
โ โโโ summarize.py # Extensible summarization logic
โโโ reports/ # Generated Markdown reports
โโโ bio_agent.db # Local SQLite database
โโโ pyproject.toml
โโโ README.md
๐ง Why No LLM or API?
This is a deliberate design choice:
โ Fully local & reproducible
โ No API cost or rate limits
โ Transparent logic for research workflows
โ Suitable for scheduled or offline pipelines
LLM-based summarization (OpenAI / Claude / local models) can be added as an optional module in future versions.
๐ฎ Roadmap
Richer structured fields (methods, datasets, benchmarks)
Optional LLM-based summarization
Scheduled runs via cron / GitHub Actions
Paper embedding & topic clustering
Web UI (Streamlit / Gradio)
๐ License
MIT License
๐ Acknowledgements
NCBI PubMed E-utilities
AlphaFold & protein structure prediction community
Python open-source ecosystem