- All analyses should be runnable from source code
- Use relative paths:
data/processed/...not/absolute/path/... - Document dependencies explicitly (packages, versions, data sources)
- Use
set.seed()in R orrandom.seed()in Python for stochastic operations - Avoid hardcoded magic numbers — use named variables or config files
raw data → cleaning script → processed data → analysis script → results/figures
Each step should:
- Be rerunnable without manual intervention
- Document its input source and output location
- Include QC checks (row counts, data integrity)
- Have clear variable names and comments
- Use Quarto (
.qmd) for reports combining analysis + narrative - Consult documentation-standards.md for README and code comment patterns
- Keep analysis in scripts (
.R), parameter files in YAML or CSV - Use
here::here()for relative paths - Load data reliably with explicit
readr::read_csv()with column specs
library(tidyverse)
library(here)
library(ggpubr) # Consistent visualization theme
library(patchwork) # Multi-panel figure composition
library(tidyHeatmap) # Publication-quality heatmapsUse limma-voom for differential expression (handles low counts naturally):
library(edgeR) # DGEList, normalization
library(limma) # voom, differential expression
library(tidyHeatmap)
library(ggpubr)
# Typical workflow
dge <- DGEList(counts = count_matrix, group = conditions)
dge <- calcNormFactors(dge)
v <- voom(dge, design = design_matrix)
fit <- lmFit(v, design_matrix)
fit <- eBayes(fit)Use limpa for peptide/precursor-level differential expression (handles missing values naturally), or proDA for pre-aggregated protein intensities:
Peptide/Precursor-level data (recommended):
library(limpa) # Linear Models for Proteomics Data
library(limma) # For downstream analysis
# Typical workflow
dpcest <- dpc(y.prec) # Estimate detection probability curve
y.protein <- dpcQuant(y.prec, protein.id, dpc=dpcest) # Quantify proteins
fit <- dpcDE(y.protein, design) # Differential expression
fit <- eBayes(fit)
topTable(fit)Pre-aggregated protein intensities:
library(limma) # VSN normalization
library(vsn) # Variance Stabilization Normalization
library(proDA) # Probabilistic Dropout Analysis
library(tidyHeatmap)
# Typical workflow
eset <- normalize.vsn(intensity_matrix)
fit <- proDA(eset, design = design_matrix)- Use ggpubr theme for consistent styling across all plots
- Compose multi-panel figures with patchwork (use
|for side-by-side,/for stacked) - Create heatmaps with tidyHeatmap for interactive exploration or ComplexHeatmap for publication
- Use geom_text_repel from ggrepel for all text labelling
Example:
library(patchwork)
# Compose figures
p1 <- ggplot(data, aes(x, y)) + geom_point() + theme_pubr()
p2 <- ggplot(data, aes(x, z)) + geom_boxplot() + theme_pubr()
combined <- p1 | p2 # Side-by-side
# Or with layout control:
combined <- (p1 | p2) + plot_layout(widths = c(1.5, 1))library(here)
library(tidyverse)
# Load with explicit column types
samples <- readr::read_csv(
here("data", "metadata", "samples.tsv"),
col_types = cols(sample_id = col_character(), batch = col_factor())
)
results <- samples %>%
filter(!is.na(response)) %>%
mutate(log_dose = log10(dose + 1))- Use Jupyter for exploratory work; save final analysis as
.pyscripts - Organize as functions that can be imported and tested
- Use
pathlib.Pathfor file operations - Log progress and intermediate results
Example:
from pathlib import Path
import pandas as pd
DATA_DIR = Path(__file__).parent / "data" / "processed"
samples = pd.read_csv(DATA_DIR / "samples.csv")
results = samples[samples["response"].notna()].copy()