Pridict pipeline

This repo provides scripts to run the pridict algorithm Mathis et al, 2023 automatically for mulitple inputs. See documentation here for explanations on the PRIDICT model itself.

Setting up the pipeline

In order to run the pipeline, create a master folder in which you will keep all the scripts / analysis from this repository.

Within this folder, create a conda environment with pridict based on github instructions :

# clone PRIDICT repository
git clone https://github.com/mariemoullet/PRIDICT.git

# navigate into the PRIDICT specific folder within the repository
cd PRIDICT/PRIDICT

# create conda environment and install dependencies for PRIDICT (only has to be done before first run/install)

# use pridict_linux for linux machine or pridict_mac for a macbook
conda env create -f pridict_linux.yml # for linux machine, pridict_mac.yml for mac

# activate the created environment
conda activate pridict

	### ONLY FOR M1 Mac you need to additionally run the following conda install command (tensorflow): 
	conda install conda-forge::tensorflow
	###


# Now install packages to run in batch mode (withou manually providing sequences)
pip install -r requirements.txt

The batch generation also depends on R packages. To set up the R packages, install the following packages from bioconductor:

if (!require("BiocManager", quietly = TRUE))
   install.packages("BiocManager")

BiocManager::install("ensembldb")
BiocManager::install("EnsDb.Hsapiens.v86")
BiocManager::install("org.Hs.eg.db")

install.packages(tidyverse)
install.packages(spgs)
install.packages(janitor)
install.packages(digest)

or simply run

Rscript packages.R

There are 2 ways to run this pipeline :

1. All possible amino acid changes resulting from a single nucleotide change in a specific codon in multiple proteins

If you have multiple codons/proteins of interest, you can automatically generate pegRNAs for every potential single nucleotide change at your codon of interest which would result in an amino acid change. If there are synonymous mutations, these will be labelled as mutx eg C797Smut1, C797Smut2.

To run this version, create an input file called input_gene_aa.csv in the pipeline directory where columns are

gene : entrez symbol eg EGFR
id :ENSEMBL MANE protein id eg ENSP00000275493
chrom : chromosome number eg 7
aa : amino acid position eg 797
strand : gene strand orientation (+ or -)

2. All possible single nucleotide substitutions at a specific genomic position

This is part is designed to use prime editing to engineer mutations at sites targeted by base editors. The pipeline expects an input format with at least the following columns:

Gene : gene symbol eg EGFR
chr : chromosome number eg 7
Edit_Location : genomic locations of possible edits, separated by ;

Executing the pipeline

To run the pipeline, in command line run

bash run_batch_mode.sh [--filter=<filter>] [--output=<output_filename.csv>] <intput_filename>

The input arguments are

input file : either input_from_aa.csv or input_from_be.csv as described above
filter : optional boolean argument : should the file pegRNAs (include Gibson Assembly homology arms) be <= 200 bp? (default is true)
output file name: optional, default is oligos.csv (nb include extension in argument)

The output returns a file (by default called oligos.csv) which includes:

gene and mutation for the edit encoded by the pegRNA
pridict_rank : rank from the pridict algorithm
pegRNA : the pegRNA sequence
for Gibson assembly : pegRNA with added Gibson assembly arms (format is TATCTTGTGGAAAGGACGAAA + pegRNA + GCGCGGTTCTATCTAGTTACGCGT)
length / length with GA overhang : pegRNA oligo length with or without GA overhang

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
PRIDICT		PRIDICT
intermediate		intermediate
prepare_pridict		prepare_pridict
pridict_to_oligos		pridict_to_oligos
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
packages.R		packages.R
requirements.txt		requirements.txt
run_batch_mode.sh		run_batch_mode.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pridict pipeline

Setting up the pipeline

1. All possible amino acid changes resulting from a single nucleotide change in a specific codon in multiple proteins

2. All possible single nucleotide substitutions at a specific genomic position

Executing the pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pridict pipeline

Setting up the pipeline

1. All possible amino acid changes resulting from a single nucleotide change in a specific codon in multiple proteins

2. All possible single nucleotide substitutions at a specific genomic position

Executing the pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages