GeneKnow is a command-line tool for discovering and inspecting gene functions in specific biological contexts. Given a list of genes and a context (e.g., a cell type or disease), it searches the biomedical literature, retrieves relevant evidence passages using BM25 scoring, and uses LLM-powered pipelines to summarize, verify, and synthesize findings.
GeneKnow requires Python >= 3.13.7.
git clone https://github.com/zang-lab/GeneKnow.git
cd GeneKnow
pip install .This installs the geneknow (and GeneKnow) CLI entry point.
GeneKnow requires an OpenAI API key to function. This key is used for all LLM operations, including passage summarization, article summary verification, and final synopsis generation.
An Elsevier API key is optional. It is only required if you want to sort search results by citation count (--sort-cited) or if you choose to use Scopus as your search engine (--search-engine scopus). For standard relevance-based searches via PubMed or Europe PMC, you do not need to provide this key.
Linux / macOS (Bash/Zsh)
export OPENAI_API_KEY="sk-..."
export ELSEVIER_API_KEY="..."To make them persistent across terminal sessions, add the above lines to your shell profile file (e.g., ~/.bashrc, ~/.zshrc, or ~/.bash_profile), then reload:
source ~/.bashrc # or ~/.zshrcWindows (Command Prompt)
set OPENAI_API_KEY=sk-...
set ELSEVIER_API_KEY=...Windows (PowerShell)
$env:OPENAI_API_KEY="sk-..."
$env:ELSEVIER_API_KEY="..."GeneKnow provides two subcommands: discover and inspect.
Automatically search the literature and synthesize a synopsis per gene.
Example
Discover the roles three interested genes play in prostate cancer.
geneknow discover \
-g FOXA1 HOXB13 BRCA1 \
-c "prostate cancer" PCa \
-n PCa \
--max-papers 5Key arguments
| Argument | Description |
|---|---|
-g, --genes |
Space-separated gene symbols (e.g., -g FOXA1 HOXB13). Overrides -G if both are provided. |
-G, --genes-file |
Path to a file with one gene symbol per line. Ignored if -g/--genes is provided. |
-c, --context |
Required. Space-separated context aliases (e.g., -c "prostate cancer" PCa). |
-n, --name |
Required. Project name (alphanumeric, hyphens, underscores only). Output goes to outdir/name/. |
-o, --outdir |
Output directory. Defaults to the current directory. |
-s, --species |
Species for gene-alias lookup. Default: human. |
--max-papers |
Max papers to review per gene. Default: 3. |
--max-passages |
Max evidence passages to review per paper. Default: 3. |
--search-engine |
pubmed (default), europepmc, or scopus. Requires ELSEVIER_API_KEY when using Scopus. |
--search-limit |
Max papers to fetch from the search engine per gene. Default: 25. |
--sort-cited |
After fetching search-limit number of most relevant papers, geneknow reviews the most relevant max-papers by default. By setting this flag, geneknow will first sort results by citation count, thus the most cited papers among the search-limit number of most relevant papers are reviewed. |
--auto-alias |
Enable automatic gene alias matching via NCBI Gene. |
-N, --suffix-not-allowed |
Disable suffix matching on context terms (e.g., plurals). Only exact context terms will be used. This does not affect gene-name suffix handling. |
Deep-dive into a single paper specified by PMID or PMCID.
Example
Summarize the functional role of FOXA1 in prostate cancer based on a specific paper (PMID: 40570057).
geneknow inspect \
-g FOXA1 \
-c "prostate cancer" PCa \
--pmid 40570057 \
-n brca1_PCaKey arguments
| Argument | Description |
|---|---|
-g, --genes |
Space-separated gene symbols (e.g., -g FOXA1 HOXB13). Overrides -G if both are provided. |
-G, --genes-file |
Path to a file with one gene symbol per line. Ignored if -g/--genes is provided. |
-c, --context |
Required. Space-separated context aliases (e.g., -c "prostate cancer" PCa). |
-n, --name |
Required. Project name (alphanumeric, hyphens, underscores only). Output goes to outdir/name/. |
-o, --outdir |
Output directory. Defaults to the current directory. |
-s, --species |
Species for gene-alias lookup. Default: human. |
--max-passages |
Max evidence passages to review per paper. Default: 3. |
--pmid |
PubMed ID of the target paper. |
--pmcid |
PubMed Central ID of the target paper. |
--auto-alias |
Enable automatic gene alias matching via NCBI Gene. |
-N, --suffix-not-allowed |
Disable suffix matching on context terms (e.g., plurals). Only exact context terms will be used. This does not affect gene-name suffix handling. |
Results are saved under outdir/name/:
GeneKnow_report.csv— Summary report per geneGeneKnow_report.html— HTML report (discover mode)token_usage.csv— LLM token usage per genesynopses/— Gene-level synthesized synopses (discover mode)article_summaries/— Per-paper summaries (discover mode)evidence_passages/— Retrieved evidence passages with BM25 scoreshtml/— Paper-level HTML reports (discover mode)error_genes.txt— List of genes that encountered errors (if any)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.