This repository contains a graph-level variational graph autoencoder implementation and helper scripts for preparing graph datasets as PyTorch Geometric Data objects.
It provides the VGAE model and dataset-conversion utilities only. Raw toxicogenomics data, inferred networks, trained models, generated datasets, NNSR, ECv/Delta ECv programs, and downstream figure-reproduction scripts are not included.
The code uses Python with PyTorch and PyTorch Geometric. The experiments associated with this implementation used:
- PyTorch 1.9.1
- PyTorch Geometric 2.0.1
- NVIDIA GeForce RTX 2080 Ti 11 GB
Install PyTorch and PyTorch Geometric versions that match your local CPU/CUDA environment. The remaining Python dependencies are listed in requirements.txt.
Example setup:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtGenerate small BA/WS/ER sample graphs:
python make_sample_graph.py --graph_num 3 --outdir sample_data --seed 12345Convert the sample graphs into a joblib dataset:
python make_sample_dataset.py \
--graphdir sample_data \
--masterfile SampleGraph_MasterSheet.txt \
--datasetfile SampleGraph_Dataset.jbl \
--seed 12345Train the VGAE on CPU:
python vgae.py \
--mode train \
--savepath ExpTEST \
--dataset SampleGraph_Dataset.jbl \
--epoch 10 \
--beta 0.001 \
--gpu -1 \
--seed 12345Run 5-fold cross-validation:
python vgae.py \
--mode train_cv \
--savepath ExpTEST_CV \
--dataset SampleGraph_Dataset.jbl \
--epoch 10 \
--beta 0.001 \
--gpu -1 \
--seed 12345Run inference with a trained model:
python vgae.py \
--mode infer \
--savepath ExpTEST_INFER \
--dataset SampleGraph_Dataset.jbl \
--model ExpTEST/graph.model \
--beta 0.001 \
--gpu -1 \
--seed 12345make_TG_dataset.py expects user-provided graph files and an expression matrix. Graph files must be tab-separated and include Parent and Child columns. File names must follow:
DREs_DoseSeries_<DRUG>_<TIME>_<DOSE>-Control.graph.tsv
where <TIME> is 2hr, 8hr, or 24hr, and <DOSE> is Low, Middle, or High.
The expression matrix must be a tab-separated table with genes as rows, samples as columns, and the first column used as the gene index.
Example:
python make_TG_dataset.py \
--graphdir path/to/graph_files \
--expfile path/to/expression_matrix.tsv \
--masterfile TGgraph2_MasterSheet.txt \
--datasetfile TGgraph2_Dataset.jbl \
--seed 12345Generated datasets, trained models, plots, logs, and private input data should not be committed.
You can run a lightweight smoke test for the sample graph generator with:
python -m unittest discover -s tests -qThis test covers only the deterministic sample-graph export path. Full model training and toxicogenomics dataset conversion still require the runtime dependencies described above.
If you use this repository, cite the accompanying manuscript and the archived software release:
Tanaka, Y., and Sakuragi, M. (2026). graph-vgae: graph-level variational graph autoencoder framework for gene networks. Zenodo. https://doi.org/10.5281/zenodo.19928168
This repository is released under the MIT License. See LICENSE.