Documentation Standards

README.md Structure

Every project should have a README with sections in this order:

# Project Name

Brief 1-2 sentence description of what this project does.

## Overview

Expanded description: problem it solves, methods used, key findings.

## Quick Start

Commands to run the analysis from scratch:
- List dependencies (R packages, Python packages, external tools)
- Installation steps if non-standard
- Command to reproduce main results

## Directory Structure

Brief description of key folders (link to docs if detailed).
See [data-organization.md](./data-organization.md).

## Key Files

| File | Purpose |
|------|---------|
| analysis/main.R | Primary analysis |
| R/utils.R | Helper functions |

## Data

Source of raw data, any preprocessing notes, where to find metadata mappings.

## Results

Summary of main outputs. Link to manuscripts or figure descriptions.

## References

Citations to papers, external data sources, related projects.

### Setting up bibliography

All new projects should include the lab master bibliography as a git submodule:

```bash
git submodule add https://github.com/cujoisa/master_bibliography refs/

In Quarto .qmd files, reference in YAML frontmatter:

---
title: "Analysis Title"
bibliography: refs/master_bibliography/master_compressed.bib
---

Then cite in text using [@citation_key] syntax:

This analysis uses limma [@Ritchie2015limma] and pathway analysis [@Subramanian2005gsea].

Quarto will automatically format citations and generate a References section at the end.

Contributing

How to update or extend this project.


## Code Comments

- Comment the **why**, not the **what**
- Bad: `x <- x + 1  # Add 1 to x`
- Good: `# Adjust for batch effect by adding offset`
- Section headers: `# ---- Data Loading ----` with 4 dashes

## R Function Documentation

Use roxygen comments for functions:

```r
#' Calculate response metric
#'
#' @param dose_response tibble with columns dose_nM, response_pct
#' @param threshold numeric, response cutoff (default 50)
#'
#' @return tibble with responders classified
#' @examples
#' calculate_response(dose_data, threshold = 20)
#'
#' @export
calculate_response <- function(dose_response, threshold = 50) {
  # Implementation
}

Python Docstrings

Use Google-style docstrings:

def calculate_response(dose_response: pd.DataFrame, threshold: float = 50) -> pd.DataFrame:
    """Calculate response metric.

    Args:
        dose_response: DataFrame with columns dose_nM, response_pct
        threshold: Response cutoff percentage (default 50)

    Returns:
        DataFrame with responders classified

    Examples:
        >>> result = calculate_response(dose_data, threshold=20)
    """
    # Implementation

Inline Metadata

For data files, include README or metadata file:

samples.tsv → paired with samples_metadata.md or _sample_key.txt
combined_dose_data.csv → describe columns, units, missing value codes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation Standards

README.md Structure

Contributing

Python Docstrings

Inline Metadata

FilesExpand file tree

documentation-standards.md

Latest commit

History

documentation-standards.md

File metadata and controls

Documentation Standards

README.md Structure

Contributing

Python Docstrings

Inline Metadata