DOSE-L1000

Junmin Wang 12/18/2024

This page aims to present DOSE-L1000 in an accessible way for a broad audience. For those interested in implementing GAMs or rebuilding the DOSE-L1000 database, detailed prerequisites and code descriptions are provided at the bottom of this page.

Introduction

The LINCS L1000 project has the transcriptional profiles for thousands of compounds across a wide array of concentrations, cell lines, and time points. However, conventional analysis methods (i.e., moderated z-scores, characteristic direction method) often fall short in capturing the rich information encapsulated within the L1000 dose-response data.

We present DOSE-L1000, a database that unravels the intricate landscape of compound-induced transcriptional changes and compound-gene interactions. By fitting over 140 million generalized additive models (GAM) and robust linear models encompassing the entire LINCS L1000 database, our work provides quantitative insights into differential gene expression and the potency and efficacy of compound-gene pairs across diverse cellular contexts (Wang et al., 2023; https://pmc.ncbi.nlm.nih.gov/articles/PMC10663987/) (Fig. 1).

Fig. 1. Illustration of a model-based approach for unveiling gene expression signatures, efficacy, and potency. (a) Schematic overview of model fitting and differential expression analysis. (b) Definitions of potency and efficacy.

Features

Our GAM-based approach leverages the dose-response trend to offer two key advantages:

It enables the calculation of potency and efficacy, along with their standard errors, something traditional methods cannot provide (Fig. 1b).
It provides more robust estimates of log₂ fold changes compared to linear models, robust linear models, moderated z-scores (Subramanian et al., 2017), and the characteristic direction method (Duan et al., 2016) (Fig. 2). GAM produces the most similar log₂ fold changes under the same perturbation condition across batches among them all.

Fig. 2. Distribution of pairwise (a) inter-batch and (b) intra-batch Pearson correlation coefficients of model-derived log₂ fold changes. Inter-batch correlation refers to the similarity of log₂ fold changes across batches under the same perturbation condition, whereas intra-batch correlation, the similarity of log₂ fold changes across perturbation conditions within the same batch. GAM, RLM, LM, z-score, and CDS2 stand for generalized additive model, robust linear model, linear model, moderated z-score, and the characteristic direction method, respectively.

How to Use DOSE-L1000

I have developed DOSE-L1000-Viz, a Shiny app that enables users to explore compound-induced gene expression changes, as well as the efficacy and potency of compound-gene pairs. This app offers an interactive platform for visualization and analysis, accessible at https://www.dosel1000.com.

With this app, you can:

Generate compound-level volcano plots.
Create gene-level volcano plots.
Visualize dose-response data.
Compare efficacy and potency across compounds, genes, or cell lines (Fig. 3).

Fig. 3. Overview of the backend database and frontend interface of DOSE-L1000-Viz.

Prerequisites

The original LINCS L1000 data are required to generate the DOSE-L1000 database. The complete list of required files is provided in Supplementary Table 1 of our DOSE-L1000 paper in Bioinformatics (see References).
The DOSE-L1000 database is required to reproduce the plots and analysis in our DOSE-L1000 paper and can be downloaded from https://doi.org/10.5281/zenodo.8286375. All required R packages to run the scripts are listed in the header of each file.

Code Descriptions

Purpose	Folder	File name	Description
Database Generation	model/	main_GSE92742.R main_GSE70138.R	differential expression analysis for all combinations of compounds, cell lines, and genes
	model/	main_lx50_GSE92742.R main_lx50_GSE70138.R	characterizing the efficacy and potency of compound-gene pairs for all combinations of compounds, cell lines, and genes
	funs/	get.gam.tests.R get.rlm.tests.R get.lm.tests.R	t-tests to compare the mean responses for GAMs, robust linear models, and linear models
		get.gam.stats.R	calculating the estimates and standard errors of potency and efficacy via delta method
		get.gamer.log2fc.R get.rlmer.log2fc.R	estimating log₂ fold change for mixed GAMs and robust linear models with random effects
		get.gam.stats.boot.R	calculating the estimates and standard errors of efficacy and potency via bootstrapping
Plotting	plot/	plot_volcano.R	volcano plots in Figure 3a of our DOSE-L1000 paper
		plot_potency_vs_efficacy.R	scatter plots in Figures 3b and 3c of our DOSE-L1000 paper
		plot_heatmap.R	heat map in Figure 4 of our DOSE-L1000 paper
Normality Assumption	normal_dist/	check_norm_assump_GSE92742.R check_norm_assump_GSE70138.R	calculating the Pearson correlation coefficients from Q-Q plots for all combinations of compounds, cell lines, and time
Inter- and Intra-Batch Correlations	robust/	find_common_cpd_cell_pairs.R	finding repeated compound-cell line-batch combinations
		fit_all_models_GSE92742.R fit_all_models_GSE70138.R	fitting GAMs, robust linear models, and linear models, and running differential expression analysis for all repeated combinations of compounds and cell lines
		calc_sd_log2fc.R	calculating standard deviation of log₂ fold changes for all repeated perturbation conditions
		calc_inter_batch_corr.R calc_intra_batch_corr.R	calculating pairwise inter- and intra-batch Pearson correlation coefficients
Plate Effects	plate/	fit_mixed_model_GSE92742.R fit_mixed_model_GSE70138.R	fitting mixed models and estimating log₂ fold change for all combinations of compounds and cell lines
Plate Effects	plate/	compare_fixed_vs_mixed.R	calculating the Pearson correlation between the results of fixed effect-only models and mixed models
Delta Method vs Bootstrapping	delta_vs_bootstrap/	compare_delta_vs_boot.R	comparing the delta method against bootstrapping
Predictions of Drug-Target Interactions	LOOCV/	find_HDACi_DTC.R	retrieving HDAC inhibitors from Drug Target Commons
Predictions of Drug-Target Interactions	LOOCV/	LOOCV_HDACi.R	building a classifier for HDAC inhibitors and running leave-one-out cross-validation (LOOCV) on the classifier

Note: GSE92742 and GSE70138 are the accession codes of phase 1 and phase 2 LINCS projects, respectively.

Contribute

We welcome your feedback and contributions! If you have suggestions, find an issue, or want to add new features, feel free to open an issue or submit a pull request. Together, we can make this project even better!

References

Wang J, Novick S. DOSE-L1000: unveiling the intricate landscape of compound-induced transcriptional changes. Bioinformatics. 2023 Nov 1;39(11): btad683.
Subramanian A et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017 Nov 30;171(6): 1437-1452.e17.
Duan Q et al. L1000CDS²: LINCS L1000 characteristic direction signatures search engine. NPJ Syst Biol Appl. 2016;2:16015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOSE-L1000

Introduction

Features

How to Use DOSE-L1000

Prerequisites

Code Descriptions

Contribute

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LOOCV		LOOCV
delta_vs_bootstrap		delta_vs_bootstrap
funs		funs
images		images
model		model
normal_dist		normal_dist
plate		plate
plot		plot
robust		robust
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DOSE-L1000

Introduction

Features

How to Use DOSE-L1000

Prerequisites

Code Descriptions

Contribute

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages