Skip to content

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No finite residual standard deviations #37

@mbihie

Description

@mbihie

Hello,

I am trying to run the DEP package on a MaxQuant output for a paper (x) that used 21 different treatments on all 5 patient samples . I ran MaxQuant as if each sample was its own experiment and generated a proteinGroups.txt file lfq-proteinGroups.txt.

This was the code I used to create the design object. I wasn't sure of how to fill out the condition and replicate since I want to compare all of them against each other equally.

#create experiment design object 
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)

When going through the DEP package example (x), I ran into an issue when testing all possible comparisons of samples in the differential enrichment analysis portion of the tutorial.

type = "all"

# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations

type = "manual"

# Test manually defined comparisons
data_diff_manual <- test_diff(data_norm, type = "manual", 
                              test = c("X1", "X2"))

Error in .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
No finite residual standard deviations

I am unsure of what mistake I made and if it has to do with my use of the DEP package or the MaxQuant output. Please let me know where I went wrong.

complet code

# Loading a package required for data handling
library("dplyr")

# Read raw file
raw = read.delim("~/proteomics-practice/data/lfq-proteinGroups.txt", stringsAsFactors = FALSE, colClasses = "character")

# The data is provided with the package
data <- raw

# We filter for contaminant proteins and decoy database hits, which are indicated by "+" in the columns "Potential.contaminants" and "Reverse", respectively. 
data <- filter(data, Reverse != "+", Potential.contaminant != "+")

# Are there any duplicated gene names?
data$Gene.names %>% duplicated() %>% any()

# Make a table of duplicated gene names
data %>% group_by(Gene.names) %>% summarize(frequency = n()) %>% 
  arrange(desc(frequency)) %>% filter(frequency > 1)

#make_unique(): generates unique identifiers for a proteomics dataset based on "name" and "id" columns.
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

# Are there any duplicated names?
data$name %>% duplicated() %>% any()

#get LFQ column numbers
LFQ_columns <- grep("LFQ.", colnames(data_unique)) 

#make LFQ values numeric
data_unique$LFQ.intensity.V1 <- as.numeric(data_unique$LFQ.intensity.V1)
data_unique$LFQ.intensity.V2 <- as.numeric(data_unique$LFQ.intensity.V2)
data_unique$LFQ.intensity.V3 <- as.numeric(data_unique$LFQ.intensity.V3)
data_unique$LFQ.intensity.V4 <- as.numeric(data_unique$LFQ.intensity.V4)
data_unique$LFQ.intensity.V5 <- as.numeric(data_unique$LFQ.intensity.V5)

#create experiment design object 
label <- c("V1", "V2", "V3", "V4", "V5")
condition <- c("1","2","3","4","5")
replicate <- c("1","2","3","4","5")
experimental_design <- data.frame(label, condition, replicate)

#Generate a SummarizedExperiment object using an experimental design
data_se <- make_se(data_unique, LFQ_columns, experimental_design)

# Generate a SummarizedExperiment object by parsing condition information from the column names
LFQ_columns <- grep("LFQ.", colnames(data_unique)) # get LFQ column numbers
data_se_parsed <- make_se_parse(data_unique, LFQ_columns)

# Let's have a look at the SummarizedExperiment object
data_se

# Plot a barplot of the protein identification overlap between samples
plot_frequency(data_se)

# Filter for proteins that are identified in all replicates of at least one condition
data_filt <- filter_missval(data_se, thr = 0)

# Less stringent filtering:
# Filter for proteins that are identified in 2 out of 3 replicates of at least one condition
data_filt2 <- filter_missval(data_se, thr = 1)

# Plot a barplot of the number of identified proteins per samples
plot_numbers(data_filt2)

# Plot a barplot of the protein identification overlap between samples
plot_coverage(data_filt)

# Normalize the data
data_norm <- normalize_vsn(data_filt)

# Visualize normalization by boxplots for all samples before and after normalization
plot_normalization(data_filt, data_norm)

# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)

# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)

# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)

# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)

# Test all possible comparisons of samples
data_diff_all_contrasts <- test_diff(data_imp, type = "all", control = NULL)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions