miRXplain

Overview

This repository contains the code associated to the manuscript "miRXplain: explainable isomiR-aware microRNA target prediction using CLIP-L experiments and hybrid attention transformers".

Features

Training and inference with miRXplain and other DL models for isomiR/miRNA target prediction (TEC-miTarget, Mimosa, GraphTar, MiTar, DMISO)
Modular codebase built with PyTorch Lightning
Online tracking of experiments with Comet.ml

Installation

miRXplain was built with Python 3.11 and PyTorch 2.8/PyTorch Lightning 2.5.5, and tested on Linux (CentOS 7, Rocky Linux 8, and Ubuntu - CUDA version 12.8). We don't guarantee compatibility with MacOS and Windows systems.

Installation with conda

conda env create -f environment.yml
conda activate mirxplain

Installation from source

To get the latest code:

pip install git+https://github.com/marsico-lab/mirxplain.git

Directory structure

.
├── bin                   # Bash and sbatch scripts for submission to SLURM HPC cluster
├── docs                  # Documentation files
├── data                  # Input datasets
├── sample_data           # Sample input datasets for testing and prediction
├── notebooks             # Jupyter notebooks
├── src                   # Core functions, models, datasets, PTL modules, etc.
├── tests                 # Testing routines
├── workflows             # Snakemake workflows to pre- and post-process datasets, sample negatives and generate the training set
├── .gitignore            # Files and folders not tracked by .git
├── LICENSE     
├── README.md
└── train_cv.py           # Entry point for training models in cross-validation
└── predict.py            # Entry point for making predictions with trained models

Data and model weights

Preprocessed training data together with trained model weights has been deposited at Zenodo (10.5281/zenodo.18010234).

How to train a miRXplain model

$ python train_cv.py -h
usage: train_cv.py [-h] [--seed SEED] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--lr LR] [--weight-decay WEIGHT_DECAY] [--patience PATIENCE] [--n-folds N_FOLDS] [--fold-limit FOLD_LIMIT]
                   [--model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}] [--input-data-path INPUT_DATA_PATH] [--comet-logging] [--comet-project COMET_PROJECT] [--cnn {basic,inception,residual,dilated,depthwise}] [--pe {basic,weighted}]
                   [--attention {self-attention,cross-attention,hybrid-attention}] [--word2vec-model-dir WORD2VEC_MODEL_DIR]

options:
  -h, --help            show this help message and exit
  --seed SEED           Random seed
  --epochs EPOCHS       Maximum number of epochs to train for
  --batch-size BATCH_SIZE
                        Batch size
  --lr LR               Learning rate
  --weight-decay WEIGHT_DECAY
                        Weight decay, use 0 for no weight decay
  --patience PATIENCE   Number of epochs to wait before early stopping
  --n-folds N_FOLDS     Number of folds for cross-validation
  --fold-limit FOLD_LIMIT
                        Limit the number of folds to run for testing
  --model {miRXplain,TEC-miTarget,TransPHLA,Mimosa,GraphTar,CNNSequenceModel,MiTar,DMISO}
                        Name of the model
  --input-data-path INPUT_DATA_PATH
  --comet-logging       Whether to log to Comet.ml
  --comet-project COMET_PROJECT
                        Name of the project for Comet.ml logging
  --cnn {basic,inception,residual,dilated,depthwise}
                        CNN type for miRXplain model
  --pe {basic,weighted}
                        type of positional encoding
  --attention {self-attention,cross-attention,hybrid-attention}
                        type of attention mechanism
  --word2vec-model-dir WORD2VEC_MODEL_DIR
                        Path to the word2vec model for GraphTar

For example:

python train_cv.py --model miRXplain --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4 --pe basic --attention hybrid-attention

How to predict with a trained miRXplain model

$ python predict.py -h
usage: predict.py [-h] [--input-data-path INPUT_DATA_PATH] [--checkpoint-path CHECKPOINT_PATH] [--max-mirna-len MAX_MIRNA_LEN] [--max-target-len MAX_TARGET_LEN] [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS] [--output-mode {basic,perturb,fusion,attn,all}]
                  [--comet-logging] [--comet-project COMET_PROJECT]

options:
  -h, --help            show this help message and exit
  --input-data-path INPUT_DATA_PATH
  --checkpoint-path CHECKPOINT_PATH
  --max-mirna-len MAX_MIRNA_LEN
  --max-target-len MAX_TARGET_LEN
  --batch-size BATCH_SIZE
                        Batch size for prediction
  --num-workers NUM_WORKERS
                        Number of workers for data loading
  --output-mode {basic,perturb,fusion,attn,all}
                        Output format for prediction results
  --comet-logging       Whether to log to Comet.ml
  --comet-project COMET_PROJECT
                        Name of the project for Comet.ml logging

For example:

python predict.py --input-data-path sample_data/prediction_set.tsv --max-mirna-len 33 --max-target-len 41 --checkpoint-path models/mirxplain.ckpt

How to use all the other models

The entry points for training the additional models benchmarked in the paper are the same as for miRXplain models but arguments configurations differ. For example:

TEC-miTarget

python train_cv.py --model TEC-miTarget --input-data-path data/clipl_dataset.tsv --batch-size 64 --lr 0.0001

Mimosa

python train_cv.py --model Mimosa --input-data-path data/clipl_dataset.tsv --batch-size 32 --lr 1e-4

GraphTar

python train_cv.py --model GraphTar --input-data-path data/clipl_dataset.tsv --word2vec-model-dir data/word2vec-models-r-5/ --lr 1e-3 --batch-size 128

MiTar

python train_cv.py --model MiTar --input-data-path data/clipl_dataset.tsv --lr 1e-4

DMISO

python train_cv.py  --model DMISO --input-data-path data/clipl_dataset.tsv --batch-size 100

Logging training experiments with Comet.ml

miRXplain supports online logging in addition to local logging (via CSV files) over Comet.ml.

This is possible enabling the option --comet-logging in the train_cv.py script. However, before running you first need to create an account and configure Comet (https://www.comet.com/docs/v2/guides/tracking-ml-training/configuring-comet/).

To do so, create a config file .comet.config with content

[comet]
api_key=<Your API Key>
workspace=<Your Workspace Name>
project_name=<Your Project Name>

Then run export COMET_CONFIG=<Path To Your Comet Config>

or, move the file under your home directory as ~/.comet.config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miRXplain

Overview

Features

Installation

Installation with conda

Installation from source

Directory structure

Data and model weights

How to train a miRXplain model

How to predict with a trained miRXplain model

How to use all the other models

Logging training experiments with Comet.ml

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs/images		docs/images
sample_data		sample_data
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
predict.py		predict.py
pyproject.toml		pyproject.toml
train_cv.py		train_cv.py

Folders and files

Latest commit

History

Repository files navigation

miRXplain

Overview

Features

Installation

Installation with conda

Installation from source

Directory structure

Data and model weights

How to train a miRXplain model

How to predict with a trained miRXplain model

How to use all the other models

Logging training experiments with Comet.ml

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages