Skip to content

clinfo/graph-vgae

Repository files navigation

VGAE

This repository contains a graph-level variational graph autoencoder implementation and helper scripts for preparing graph datasets as PyTorch Geometric Data objects.

It provides the VGAE model and dataset-conversion utilities only. Raw toxicogenomics data, inferred networks, trained models, generated datasets, NNSR, ECv/Delta ECv programs, and downstream figure-reproduction scripts are not included.

Requirements

The code uses Python with PyTorch and PyTorch Geometric. The experiments associated with this implementation used:

  • PyTorch 1.9.1
  • PyTorch Geometric 2.0.1
  • NVIDIA GeForce RTX 2080 Ti 11 GB

Install PyTorch and PyTorch Geometric versions that match your local CPU/CUDA environment. The remaining Python dependencies are listed in requirements.txt.

Example setup:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Sample workflow

Generate small BA/WS/ER sample graphs:

python make_sample_graph.py --graph_num 3 --outdir sample_data --seed 12345

Convert the sample graphs into a joblib dataset:

python make_sample_dataset.py \
  --graphdir sample_data \
  --masterfile SampleGraph_MasterSheet.txt \
  --datasetfile SampleGraph_Dataset.jbl \
  --seed 12345

Train the VGAE on CPU:

python vgae.py \
  --mode train \
  --savepath ExpTEST \
  --dataset SampleGraph_Dataset.jbl \
  --epoch 10 \
  --beta 0.001 \
  --gpu -1 \
  --seed 12345

Run 5-fold cross-validation:

python vgae.py \
  --mode train_cv \
  --savepath ExpTEST_CV \
  --dataset SampleGraph_Dataset.jbl \
  --epoch 10 \
  --beta 0.001 \
  --gpu -1 \
  --seed 12345

Run inference with a trained model:

python vgae.py \
  --mode infer \
  --savepath ExpTEST_INFER \
  --dataset SampleGraph_Dataset.jbl \
  --model ExpTEST/graph.model \
  --beta 0.001 \
  --gpu -1 \
  --seed 12345

Toxicogenomics-style dataset conversion

make_TG_dataset.py expects user-provided graph files and an expression matrix. Graph files must be tab-separated and include Parent and Child columns. File names must follow:

DREs_DoseSeries_<DRUG>_<TIME>_<DOSE>-Control.graph.tsv

where <TIME> is 2hr, 8hr, or 24hr, and <DOSE> is Low, Middle, or High.

The expression matrix must be a tab-separated table with genes as rows, samples as columns, and the first column used as the gene index.

Example:

python make_TG_dataset.py \
  --graphdir path/to/graph_files \
  --expfile path/to/expression_matrix.tsv \
  --masterfile TGgraph2_MasterSheet.txt \
  --datasetfile TGgraph2_Dataset.jbl \
  --seed 12345

Generated datasets, trained models, plots, logs, and private input data should not be committed.

Smoke test

You can run a lightweight smoke test for the sample graph generator with:

python -m unittest discover -s tests -q

This test covers only the deterministic sample-graph export path. Full model training and toxicogenomics dataset conversion still require the runtime dependencies described above.

Citation

If you use this repository, cite the accompanying manuscript and the archived software release:

Tanaka, Y., and Sakuragi, M. (2026). graph-vgae: graph-level variational graph autoencoder framework for gene networks. Zenodo. https://doi.org/10.5281/zenodo.19928168

License

This repository is released under the MIT License. See LICENSE.

About

Graph-level variational graph autoencoder implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages