Skip to content

Latest commit

 

History

History
134 lines (108 loc) · 4.02 KB

File metadata and controls

134 lines (108 loc) · 4.02 KB

Analysis Workflows

Reproducibility

  • All analyses should be runnable from source code
  • Use relative paths: data/processed/... not /absolute/path/...
  • Document dependencies explicitly (packages, versions, data sources)
  • Use set.seed() in R or random.seed() in Python for stochastic operations
  • Avoid hardcoded magic numbers — use named variables or config files

Data Pipeline Pattern

raw data → cleaning script → processed data → analysis script → results/figures

Each step should:

  1. Be rerunnable without manual intervention
  2. Document its input source and output location
  3. Include QC checks (row counts, data integrity)
  4. Have clear variable names and comments

R Workflows

  • Use Quarto (.qmd) for reports combining analysis + narrative
  • Consult documentation-standards.md for README and code comment patterns
  • Keep analysis in scripts (.R), parameter files in YAML or CSV
  • Use here::here() for relative paths
  • Load data reliably with explicit readr::read_csv() with column specs

Core Setup (All Analyses)

library(tidyverse)
library(here)
library(ggpubr)      # Consistent visualization theme
library(patchwork)   # Multi-panel figure composition
library(tidyHeatmap) # Publication-quality heatmaps

RNA-seq Analysis

Use limma-voom for differential expression (handles low counts naturally):

library(edgeR)    # DGEList, normalization
library(limma)    # voom, differential expression
library(tidyHeatmap)
library(ggpubr)

# Typical workflow
dge <- DGEList(counts = count_matrix, group = conditions)
dge <- calcNormFactors(dge)
v <- voom(dge, design = design_matrix)
fit <- lmFit(v, design_matrix)
fit <- eBayes(fit)

Proteomics Analysis

Use limpa for peptide/precursor-level differential expression (handles missing values naturally), or proDA for pre-aggregated protein intensities:

Peptide/Precursor-level data (recommended):

library(limpa)   # Linear Models for Proteomics Data
library(limma)   # For downstream analysis

# Typical workflow
dpcest <- dpc(y.prec)                    # Estimate detection probability curve
y.protein <- dpcQuant(y.prec, protein.id, dpc=dpcest)  # Quantify proteins
fit <- dpcDE(y.protein, design)          # Differential expression
fit <- eBayes(fit)
topTable(fit)

Pre-aggregated protein intensities:

library(limma)   # VSN normalization
library(vsn)     # Variance Stabilization Normalization
library(proDA)   # Probabilistic Dropout Analysis
library(tidyHeatmap)

# Typical workflow
eset <- normalize.vsn(intensity_matrix)
fit <- proDA(eset, design = design_matrix)

Figures & Visualization

  • Use ggpubr theme for consistent styling across all plots
  • Compose multi-panel figures with patchwork (use | for side-by-side, / for stacked)
  • Create heatmaps with tidyHeatmap for interactive exploration or ComplexHeatmap for publication
  • Use geom_text_repel from ggrepel for all text labelling

Example:

library(patchwork)

# Compose figures
p1 <- ggplot(data, aes(x, y)) + geom_point() + theme_pubr()
p2 <- ggplot(data, aes(x, z)) + geom_boxplot() + theme_pubr()
combined <- p1 | p2  # Side-by-side
# Or with layout control:
combined <- (p1 | p2) + plot_layout(widths = c(1.5, 1))

Data Loading Example

library(here)
library(tidyverse)

# Load with explicit column types
samples <- readr::read_csv(
  here("data", "metadata", "samples.tsv"),
  col_types = cols(sample_id = col_character(), batch = col_factor())
)

results <- samples %>%
  filter(!is.na(response)) %>%
  mutate(log_dose = log10(dose + 1))

Python Workflows

  • Use Jupyter for exploratory work; save final analysis as .py scripts
  • Organize as functions that can be imported and tested
  • Use pathlib.Path for file operations
  • Log progress and intermediate results

Example:

from pathlib import Path
import pandas as pd

DATA_DIR = Path(__file__).parent / "data" / "processed"
samples = pd.read_csv(DATA_DIR / "samples.csv")
results = samples[samples["response"].notna()].copy()