This repository contains the code associated to the manuscript "miRXplain: explainable isomiR-aware microRNA target prediction using CLIP-L experiments and hybrid attention transformers".
- Training and inference with miRXplain and other DL models for isomiR/miRNA target prediction (TEC-miTarget, Mimosa, GraphTar, MiTar, DMISO)
- Modular codebase built with PyTorch Lightning
- Online tracking of experiments with Comet.ml
miRXplain was built with Python 3.11 and PyTorch 2.8/PyTorch Lightning 2.5.5, and tested on Linux (CentOS 7, Rocky Linux 8, and Ubuntu - CUDA version 12.8). We don't guarantee compatibility with MacOS and Windows systems.
conda env create -f environment.yml
conda activate mirxplain
To get the latest code:
pip install git+https://github.com/marsico-lab/mirxplain.git
.
├── bin # Bash and sbatch scripts for submission to SLURM HPC cluster
├── docs # Documentation files
├── data # Input datasets
├── sample_data # Sample input datasets for testing and prediction
├── notebooks # Jupyter notebooks
├── src # Core functions, models, datasets, PTL modules, etc.
├── tests # Testing routines
├── workflows # Snakemake workflows to pre- and post-process datasets, sample negatives and generate the training set
├── .gitignore # Files and folders not tracked by .git
├── LICENSE
├── README.md
└── train_cv.py # Entry point for training models in cross-validation
└── predict.py # Entry point for making predictions with trained models
Preprocessed training data together with trained model weights has been deposited at Zenodo (10.5281/zenodo.18010234).
$ python train_cv.py -h
usage: train_cv.py [-h] [--seed SEED] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--lr LR] [--weight-decay WEIGHT_DECAY] [--patience PATIENCE] [--n-folds N_FOLDS] [--fold-limit FOLD_LIMIT]
[--model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}] [--input-data-path INPUT_DATA_PATH] [--comet-logging] [--comet-project COMET_PROJECT] [--cnn {basic,inception,residual,dilated,depthwise}] [--pe {basic,weighted}]
[--attention {self-attention,cross-attention,hybrid-attention}] [--word2vec-model-dir WORD2VEC_MODEL_DIR]
options:
-h, --help show this help message and exit
--seed SEED Random seed
--epochs EPOCHS Maximum number of epochs to train for
--batch-size BATCH_SIZE
Batch size
--lr LR Learning rate
--weight-decay WEIGHT_DECAY
Weight decay, use 0 for no weight decay
--patience PATIENCE Number of epochs to wait before early stopping
--n-folds N_FOLDS Number of folds for cross-validation
--fold-limit FOLD_LIMIT
Limit the number of folds to run for testing
--model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}
Name of the model
--input-data-path INPUT_DATA_PATH
--comet-logging Whether to log to Comet.ml
--comet-project COMET_PROJECT
Name of the project for Comet.ml logging
--cnn {basic,inception,residual,dilated,depthwise}
CNN type for miRXplain model
--pe {basic,weighted}
type of positional encoding
--attention {self-attention,cross-attention,hybrid-attention}
type of attention mechanism
--word2vec-model-dir WORD2VEC_MODEL_DIR
Path to the word2vec model for GraphTar
For example:
python train_cv.py --model miRXplain --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4 --pe basic --attention hybrid-attention
$ python predict.py -h
usage: predict.py [-h] [--input-data-path INPUT_DATA_PATH] [--checkpoint-path CHECKPOINT_PATH] [--max-mirna-len MAX_MIRNA_LEN] [--max-target-len MAX_TARGET_LEN] [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS] [--output-mode {basic,perturb,fusion,attn,all}]
[--comet-logging] [--comet-project COMET_PROJECT]
options:
-h, --help show this help message and exit
--input-data-path INPUT_DATA_PATH
--checkpoint-path CHECKPOINT_PATH
--max-mirna-len MAX_MIRNA_LEN
--max-target-len MAX_TARGET_LEN
--batch-size BATCH_SIZE
Batch size for prediction
--num-workers NUM_WORKERS
Number of workers for data loading
--output-mode {basic,perturb,fusion,attn,all}
Output format for prediction results
--comet-logging Whether to log to Comet.ml
--comet-project COMET_PROJECT
Name of the project for Comet.ml logging
For example:
python predict.py --input-data-path sample_data/prediction_set.tsv --max-mirna-len 33 --max-target-len 41 --checkpoint-path models/mirxplain.ckptThe entry points for training the additional models benchmarked in the paper are the same as for miRXplain models but arguments configurations differ. For example:
TEC-miTarget
python train_cv.py --model TEC-miTarget --input-data-path data/clipl_dataset.tsv --batch-size 64 --lr 0.0001Mimosa
python train_cv.py --model Mimosa --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4GraphTar
python train_cv.py --model GraphTar --input-data-path data/clipl_dataset.tsv --word2vec-model-dir data/word2vec-models-r-5/ --lr 1e-3 --batch-size 128MiTar
python train_cv.py --model MiTar --input-data-path data/clipl_dataset.tsv --lr 1e-4DMISO
python train_cv.py --model DMISO --input-data-path data/clipl_dataset.tsv --batch-size 100miRXplain supports online logging in addition to local logging (via CSV files) over Comet.ml.
This is possible enabling the option --comet-logging in the train_cv.py script. However, before running you first need to create an account and configure Comet (https://www.comet.com/docs/v2/guides/tracking-ml-training/configuring-comet/).
To do so, create a config file .comet.config with content
[comet]
api_key=<Your API Key>
workspace=<Your Workspace Name>
project_name=<Your Project Name>
Then run
export COMET_CONFIG=<Path To Your Comet Config>
or, move the file under your home directory as ~/.comet.config.
