Skip to content

marsico-lab/mirxplain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miRXplain

Overview

This repository contains the code associated to the manuscript "miRXplain: explainable isomiR-aware microRNA target prediction using CLIP-L experiments and hybrid attention transformers".

Graphical abstract of miRXplain

Features

  • Training and inference with miRXplain and other DL models for isomiR/miRNA target prediction (TEC-miTarget, Mimosa, GraphTar, MiTar, DMISO)
  • Modular codebase built with PyTorch Lightning
  • Online tracking of experiments with Comet.ml

Installation

miRXplain was built with Python 3.11 and PyTorch 2.8/PyTorch Lightning 2.5.5, and tested on Linux (CentOS 7, Rocky Linux 8, and Ubuntu - CUDA version 12.8). We don't guarantee compatibility with MacOS and Windows systems.

Installation with conda

conda env create -f environment.yml
conda activate mirxplain

Installation from source

To get the latest code:

pip install git+https://github.com/marsico-lab/mirxplain.git

Directory structure

.
├── bin                   # Bash and sbatch scripts for submission to SLURM HPC cluster
├── docs                  # Documentation files
├── data                  # Input datasets
├── sample_data           # Sample input datasets for testing and prediction
├── notebooks             # Jupyter notebooks
├── src                   # Core functions, models, datasets, PTL modules, etc.
├── tests                 # Testing routines
├── workflows             # Snakemake workflows to pre- and post-process datasets, sample negatives and generate the training set
├── .gitignore            # Files and folders not tracked by .git
├── LICENSE     
├── README.md
└── train_cv.py           # Entry point for training models in cross-validation
└── predict.py            # Entry point for making predictions with trained models

Data and model weights

Preprocessed training data together with trained model weights has been deposited at Zenodo (10.5281/zenodo.18010234).

How to train a miRXplain model

$ python train_cv.py -h
usage: train_cv.py [-h] [--seed SEED] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--lr LR] [--weight-decay WEIGHT_DECAY] [--patience PATIENCE] [--n-folds N_FOLDS] [--fold-limit FOLD_LIMIT]
                   [--model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}] [--input-data-path INPUT_DATA_PATH] [--comet-logging] [--comet-project COMET_PROJECT] [--cnn {basic,inception,residual,dilated,depthwise}] [--pe {basic,weighted}]
                   [--attention {self-attention,cross-attention,hybrid-attention}] [--word2vec-model-dir WORD2VEC_MODEL_DIR]

options:
  -h, --help            show this help message and exit
  --seed SEED           Random seed
  --epochs EPOCHS       Maximum number of epochs to train for
  --batch-size BATCH_SIZE
                        Batch size
  --lr LR               Learning rate
  --weight-decay WEIGHT_DECAY
                        Weight decay, use 0 for no weight decay
  --patience PATIENCE   Number of epochs to wait before early stopping
  --n-folds N_FOLDS     Number of folds for cross-validation
  --fold-limit FOLD_LIMIT
                        Limit the number of folds to run for testing
  --model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}
                        Name of the model
  --input-data-path INPUT_DATA_PATH
  --comet-logging       Whether to log to Comet.ml
  --comet-project COMET_PROJECT
                        Name of the project for Comet.ml logging
  --cnn {basic,inception,residual,dilated,depthwise}
                        CNN type for miRXplain model
  --pe {basic,weighted}
                        type of positional encoding
  --attention {self-attention,cross-attention,hybrid-attention}
                        type of attention mechanism
  --word2vec-model-dir WORD2VEC_MODEL_DIR
                        Path to the word2vec model for GraphTar

For example:

python train_cv.py --model miRXplain --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4 --pe basic --attention hybrid-attention

How to predict with a trained miRXplain model

$ python predict.py -h
usage: predict.py [-h] [--input-data-path INPUT_DATA_PATH] [--checkpoint-path CHECKPOINT_PATH] [--max-mirna-len MAX_MIRNA_LEN] [--max-target-len MAX_TARGET_LEN] [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS] [--output-mode {basic,perturb,fusion,attn,all}]
                  [--comet-logging] [--comet-project COMET_PROJECT]

options:
  -h, --help            show this help message and exit
  --input-data-path INPUT_DATA_PATH
  --checkpoint-path CHECKPOINT_PATH
  --max-mirna-len MAX_MIRNA_LEN
  --max-target-len MAX_TARGET_LEN
  --batch-size BATCH_SIZE
                        Batch size for prediction
  --num-workers NUM_WORKERS
                        Number of workers for data loading
  --output-mode {basic,perturb,fusion,attn,all}
                        Output format for prediction results
  --comet-logging       Whether to log to Comet.ml
  --comet-project COMET_PROJECT
                        Name of the project for Comet.ml logging

For example:

python predict.py --input-data-path sample_data/prediction_set.tsv --max-mirna-len 33 --max-target-len 41 --checkpoint-path models/mirxplain.ckpt

How to use all the other models

The entry points for training the additional models benchmarked in the paper are the same as for miRXplain models but arguments configurations differ. For example:

TEC-miTarget

python train_cv.py --model TEC-miTarget --input-data-path data/clipl_dataset.tsv --batch-size 64 --lr 0.0001

Mimosa

python train_cv.py --model Mimosa --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4

GraphTar

python train_cv.py --model GraphTar --input-data-path data/clipl_dataset.tsv --word2vec-model-dir data/word2vec-models-r-5/ --lr 1e-3 --batch-size 128

MiTar

python train_cv.py --model MiTar --input-data-path data/clipl_dataset.tsv --lr 1e-4

DMISO

python train_cv.py  --model DMISO --input-data-path data/clipl_dataset.tsv --batch-size 100

Logging training experiments with Comet.ml

miRXplain supports online logging in addition to local logging (via CSV files) over Comet.ml.

This is possible enabling the option --comet-logging in the train_cv.py script. However, before running you first need to create an account and configure Comet (https://www.comet.com/docs/v2/guides/tracking-ml-training/configuring-comet/).

To do so, create a config file .comet.config with content

[comet]
api_key=<Your API Key>
workspace=<Your Workspace Name>
project_name=<Your Project Name>

Then run export COMET_CONFIG=<Path To Your Comet Config>

or, move the file under your home directory as ~/.comet.config.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors