GitHub - cxinsys/fastscode

Introduction

FastSCODE is an accelerated implementation of SCODE based on manycore computing.

Installation

🐍 Anaconda is recommended to use and develop FastSCODE.
🐧 Linux distros are tested and recommended to use and develop FastSCODE.

Create a virtual environment

After installing Anaconda, create a conda virtual environment for FastSCODE. We can also specify the Python version (e.g. python=3.12).

conda create -n fastscode python=3.12

Now, we can activate our conda virtual environment for FastSCODE as follows.

conda activate fastscode

Install from PyPi

pip install fastscode

🔥The default backend framework of FastSCODE is PyTorch.
📱You can install another backend framework such as CuPy, Jax, and TensorFlow

Install from GitHub repository

First, clone the recent version of this repository.

git clone https://github.com/cxinsys/fastscode.git

Now, we need to install FastSCODE as a module.

cd fastscode
pip install -e .

FastSCODE tutorial

Create `FastSCODE` instance

The FastSCODE class requires input files such as expression data arrays and pseudo time arrays, as well as several parameters for linear ODE optimization.

parameters

exp_data: expression data array (Gene (G) x Cell (C)), required
pseudotime: pseudotime data vector (C), required
node_name: vector for name of genes (G), required
droot: root directory for storing score matrix and RSS arrays, optional, default value is None, which means that the results are not saved
num_tf: number of genes to use, optional, default value is None, and all genes are used
num_cell: number of cells to use, optional, default value is None, and all cells are used
num_z: length of vector z for optimization, optional, default: 4
max_iter: number of iterations for optimization, optional, default: 100
max_b: maximum initialization value for parameter b, optional, default: 2.0
min_b: minimum initialization value for parameter b, optional, default: -10.0
dtype: data type, optional, default: float32
use_binary: save result matrix as binary file, optional, default: True

import fastscode as fs

exp_data = np.loadtxt(dpath_exp_data, delimiter=",", dtype=str)
node_name = exp_data[0, 1:]
exp_data = exp_data[1:, 1:].astype(np.float64).T  # gene x cell

pseudotime = np.loadtxt(dpath_trj_data, delimiter="\t")

worker = fs.FastSCODE(exp_data=exp_data,
                      pseudotime=pseudotime,
                      node_name=node_name,
                      droot=spath_droot_r,
                      num_tf=None,
                      num_cell=None,
                      num_z=num_z,
                      max_iter=max_iter,
                      dtype=np.float32,
                      use_binary=True)

Run FastSCODE

parameters

backend: optional, default: 'cpu'
device_ids: list or number of devcies to use, optional, default: [0] (cpu), [list of whole gpu devices] (gpu)
batch_size_b: batch size of optimization parameter B, optional, default: 1
batch_size: gene batch size of expression data, optional, default: None (compute all gene data at once, recommended)
chunk_size: gene chunk size of expression data in inner loop of algorithm, optional, default: None (auto calculated)

rss, score_matrix = worker.run(backend='gpu',
                               device_ids=8,
                               sampling_batch=100,
                               batch_size=1024)

Run FastSCODE with `run_scode.py`

Before run run_scode.py, batch_size_b and batch_size must be adjusted to fit your gpu memory size.

Usage

python run_scode.py --droot [root directory]
                    --fp_exp [expression file path]
                    --fp_trj [trajectory (pseudotime) file path] 
                    --fp_branch [cell select file path] 
                    --num_z [number of vector z]
                    --max_iter [number of optimization step]
                    --backend [name of backend framework]
                    --num_devices [number of devices]
                    --batch_size_b [number of parameter b]
                    --sp_droot [droot directory for saving results]
                    --num_repeat [total number of computation iterations]

Example

python run_scode.py --droot .
                    --fp_exp expression_dataTuck_sub.csv
                    --fp_trj pseudotimeTuck.txt
                    --fp_branch cell_selectTuck.txt
                    --num_z 10
                    --max_iter 100
                    --backend gpu
                    --num_devices 8
                    --batch_size_b 10
                    --sp_droot out
                    --num_repeat 6

Output

The average matrix for the repeatedly computed score matrix is saved as a binary file in --sp_droot.

avg_score_matrix.npy
ex)
0	0.05	0.02	...	0.004
0.01	0	0.04	...	0.12
0.003	0.003	0	...	0.001
0.34	0.012	0.032	...	0


node_name.txt
ex)
GENE_1
GENE_2
GENE_3
.
.
.
GENE_M

The result file for each iteration is saved in the [Number of repetation] folder under --sp_droot.

When use_binary is True, we can obtain the following result.

RSS.txt
ex)
3367844277.01837


score_matrix.npy
ex)
0	0.05	0.02	...	0.004
0.01	0	0.04	...	0.12
0.003	0.003	0	...	0.001
0.34	0.012	0.032	...	0


node_name.txt
ex)
GENE_1
GENE_2
GENE_3
.
.
.
GENE_M

When use_binary is False, we can obtain the following result.

RSS.txt
ex)
3367844277.01837

  
score_matrix.txt                            
ex)
Score	GENE_1	GENE_2	GENE_3	...	GENE_M
GENE_1	0	0.05	0.02	...	0.004
GENE_2	0.01	0	0.04	...	0.12
GENE_3	0.003	0.003	0	...	0.001
.
.
.
GENE_M	0.34	0.012	0.032	...	0

A tutorial for downstream analysis

Create `NetWeaver` instance

NetWeaver infers the network links based on the results of running FastSCODE.

parameters

result_matrix: result score matrix of fastscode, required
gene_names: gene names from result matrix, required
tfs: tf list, optional
fdr: specifying fdr, optional, default: 0.01
links: specifying number of outdegrees, optional, default: 0
is_trimming: if set True, trimming operation is applied on grn, optional, default: True
trim_threshold: trimming threshold, optional, default: 0

result_matrix = np.loadtxt(fpath_result_matrix, delimiter='\t', dtype=str)
gene_name = result_matrix[0][1:]
result_matrix = result_matrix[1:, 1:].astype(np.float32)

tf = np.loadtxt(fpath_tf, dtype=str)

weaver = fs.NetWeaver(result_matrix=result_matrix,
                      gene_names=gene_name,
                      tfs=tf,
                      fdr=fdr,
                      links=links,
                      is_trimming=True,
                      trim_threshold=trim_threshold,
                      dtype=np.float32)

Run NetWeaver

backend: optional, default: 'cpu'
device_ids: list or number of devices to use, optional, default: [0] (cpu), [list of whole gpu devices] (gpu)
batch_size: if set to 0, batch size will automatically calculated, optional, default: 0

grn, trimmed_grn = weaver.run(backend=backend,
                              device_ids=device_ids,
                              batch_size=batch_size)

Count outdegree

grn: required

outdegrees = weaver.count_outdegree(grn)
trimmed_ods = weaver.count_outdegree(trimmed_grn)

Network reconstruction

reconstruct_grn.py shows an example of reconstructing network structures from the output of grn and outdegree files.
When using a binary file, we must pass the path to the node_name.txt file to the --fp_gn parameter.
If it is not a binary file, the --fp_gn parameter is optional.

Usage

We can specify fdr as follows.

python reconstruct_grn.py --fp_rm [result matrix path]  --fp_gn [gene name file path] --fp_tf [tf file path] --fdr [fdr] --backend [backend] --device_ids [number of device]

Example

python reconstruct_grn.py --fp_rm avg_score_matrix.txt --fp_gn node_name.txt --fp_tf mouse_tf.txt --fdr 0.01 --backend gpu --device_ids 1

Output

avg_score_matrix.fdr0.01.sif, avg_score_matrix.fdr0.01.sif.outdegrees.txt
avg_score_matrix.fdr0.01.trimIndirect0.sif, avg_score_matrix.fdr0.01.trimIndirect0.sif.outdegrees.txt

Usage

We can also specify the links.

python reconstruct_grn.py --fp_rm [result matrix path] --fp_gn [gene name file path]  --fp_tf [tf file path] --links [links] --backend [backend] --device_ids [number of device]

Example

python reconstruct_grn.py --fp_rm avg_score_matrix.txt --fp_gn node_name.txt --fp_tf mouse_tf.txt --links 1000 --backend gpu --device_ids 1

Output

avg_score_matrix.links1000.sif, avg_score_matrix.links1000.sif.outdegrees.txt
avg_score_matrix.links1000.trimIndirect0.sif, avg_score_matrix.links1000.trimIndirect0.sif.outdegrees.txt

TODO

Upload to PyPi

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
assets		assets
fastscode		fastscode
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

Create a virtual environment

Install from PyPi

Install from GitHub repository

FastSCODE tutorial

Create `FastSCODE` instance

parameters

Run FastSCODE

parameters

Run FastSCODE with `run_scode.py`

Usage

Example

Output

A tutorial for downstream analysis

Create `NetWeaver` instance

parameters

Run NetWeaver

Count outdegree

Network reconstruction

Usage

Example

Output

Usage

Example

Output

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Create a virtual environment

Install from PyPi

Install from GitHub repository

FastSCODE tutorial

Create FastSCODE instance

parameters

Run FastSCODE

parameters

Run FastSCODE with run_scode.py

Usage

Example

Output

A tutorial for downstream analysis

Create NetWeaver instance

parameters

Run NetWeaver

Count outdegree

Network reconstruction

Usage

Example

Output

Usage

Example

Output

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Create `FastSCODE` instance

Run FastSCODE with `run_scode.py`

Create `NetWeaver` instance

Packages