Skip to content

GeYugong/bio-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงฌ Bio-Agent

Bio-Agent is a lightweight research agent for biomedical and structural biology literature mining.

It automatically fetches recent PubMed papers, parses and stores them locally, and exports structured Markdown reports โ€” without using any LLMs or external APIs.

Current focus: Protein Structure Prediction (e.g. AlphaFold and related methods)

โœจ Features

๐Ÿ” Automated PubMed search (time window + keyword based)

๐Ÿ“„ XML parsing into structured paper fields

๐Ÿ—„๏ธ Local SQLite storage (incremental & reproducible)

๐Ÿ“ Markdown report export

๐Ÿง  Heuristic-based โ€œKey Takeawaysโ€ digest (no LLM)

๐Ÿ” No API keys, no cloud dependency

๐Ÿ“ฆ Installation Requirements

Python 3.8+

Windows / macOS / Linux

Option 1: Development / Local Usage (Recommended) git clone https://github.com/GeYugong/bio-agent.git cd bio-agent

python -m venv .venv

Windows

..venv\Scripts\Activate.ps1

macOS / Linux

source .venv/bin/activate

pip install -e .

Verify installation:

bio-agent --help bio-agent hello

Option 2: Standard Installation pip install .

๐Ÿš€ Quick Start 1๏ธโƒฃ Fetch Recent PubMed Papers bio-agent fetch
--query "protein structure prediction OR AlphaFold"
--days 30
--retmax 20

This will:

Retrieve recent PMIDs from PubMed

Download and parse XML records

Store results in a local SQLite database (bio_agent.db)

2๏ธโƒฃ Export Markdown Reports bio-agent export --limit 10 --out report.md

Filter by keyword:

bio-agent export
--limit 20
--query-contains AlphaFold
--out alphafold_report.md

3๏ธโƒฃ Generate README Digest (Key Takeaways) bio-agent digest

What this does:

Reads recent papers from SQLite

Generates 3 key takeaways using heuristic scoring

Injects the digest into the top of README.md

Uses markers to avoid duplicate insertion

โš ๏ธ No LLMs are used โ€” all summaries are rule-based and reproducible.

๐Ÿ“ Project Structure

bio-agent/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ bio_agent/
โ”‚       โ”œโ”€โ”€ cli.py        # Typer-based CLI entrypoint
โ”‚       โ”œโ”€โ”€ pubmed.py     # PubMed E-utilities + XML parsing
โ”‚       โ”œโ”€โ”€ store.py      # SQLite schema & upsert logic
โ”‚       โ”œโ”€โ”€ exporter.py   # Markdown / README export
โ”‚       โ”œโ”€โ”€ digest.py     # Heuristic digest generation
โ”‚       โ””โ”€โ”€ summarize.py  # Extensible summarization logic
โ”œโ”€โ”€ reports/              # Generated Markdown reports
โ”œโ”€โ”€ bio_agent.db          # Local SQLite database
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

๐Ÿง  Why No LLM or API?

This is a deliberate design choice:

โœ… Fully local & reproducible

โœ… No API cost or rate limits

โœ… Transparent logic for research workflows

โœ… Suitable for scheduled or offline pipelines

LLM-based summarization (OpenAI / Claude / local models) can be added as an optional module in future versions.

๐Ÿ”ฎ Roadmap

Richer structured fields (methods, datasets, benchmarks)

Optional LLM-based summarization

Scheduled runs via cron / GitHub Actions

Paper embedding & topic clustering

Web UI (Streamlit / Gradio)

๐Ÿ“œ License

MIT License

๐Ÿ™Œ Acknowledgements

NCBI PubMed E-utilities

AlphaFold & protein structure prediction community

Python open-source ecosystem

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages