Skip to content

Latest commit

 

History

History
295 lines (209 loc) · 7.92 KB

File metadata and controls

295 lines (209 loc) · 7.92 KB

MMLN Development and Contribution Guide

This guide covers how to set up a development environment, add new R or C++ code, and validate changes before opening a PR. For a full picture of the package's structure and design decisions, see architecture.md first.


Setup

Prerequisites

  • A recent R version
  • RStudio (recommended for interactive development)
  • Build toolchain for your OS: Rtools on Windows, Xcode CLI on macOS, build-essential on Linux

Development dependencies

install.packages(c(
  "devtools",
  "roxygen2",
  "testthat",
  "profvis"
))

Optional — needed to run real-data examples:

install.packages(c("Lahman", "MM", "MGLM"))

Load the package for development

For R-only changes, you can reload the source tree from the package root with:

devtools::load_all(".")

This is useful for interactive development, but it is not the same as testing the installed package.

For changes involving C++ code, Rcpp exports, NAMESPACE, or build files, use a clean rebuild/install cycle:

Rcpp::compileAttributes(".")
devtools::document(".")
devtools::install(".", force = TRUE, upgrade = "never", dependencies = FALSE)

Then restart R and test the installed package:

library(MMLN)
verify_installation()

Before opening a PR, run a clean check:

devtools::check()

RStudio shortcuts

  • Build > Load All — equivalent to devtools::load_all()
  • Build > Check Package — equivalent to devtools::check()
  • Build > Test Package — equivalent to devtools::test()

Adding New R Code

The R layer is organized into four files — see architecture.md for what belongs where. The typical workflow:

  1. Add or modify a function in the appropriate R/*.R file.
  2. Document it with a roxygen2 block (#' @export, @param, @return, etc.).
  3. Regenerate docs and reload for interactive development:
devtools::document(".")
devtools::load_all(".")

For changes that affect exports, NAMESPACE, or installed-package behavior, also test a clean install:

devtools::install(".", force = TRUE, upgrade = "never", dependencies = FALSE)

Restart R, then run:

library(MMLN)
verify_installation()
  1. Add a test in tests/testthat/ or a script in testing/ and verify it runs.
  2. Run devtools::check() before submitting.

Key conventions (also in architecture.md):

  • Y is always a count matrix; pass it through compress_counts() before ALR-transforming.
  • Latent variables live in ALR space (d = ncol(Y) - 1).
  • The proposal argument controls MH behavior: "norm", "beta", or "normbeta".

Adding New C++ Code

The package uses Rcpp + RcppArmadillo + RcppEigen. See architecture.md for how the C++ modules are structured and how they connect to R via .Call().

Typical pattern

  1. Add or update implementation in src/*.cpp. New utility functions belong in utils.cpp; new performance-critical paths should get their own file.
  2. Annotate functions to expose to R:
// [[Rcpp::export]]
arma::vec my_function(arma::mat X) { ... }
  1. Regenerate the Rcpp wrappers, rebuild, and test the installed package:
Rcpp::compileAttributes(".")
devtools::document(".")
devtools::install(".", force = TRUE, upgrade = "never", dependencies = FALSE)

Then restart R and verify the compiled backend:

library(MMLN)
verify_installation()
  1. Validate — see Validating C++ Changes below.

C++ coding notes

  • Keep matrix/vector dimensions explicit and validated early.
  • Prefer numerically stable transforms (log-sum-exp, bounded probabilities) to reduce under/overflow risk.
  • Don't break MCMC reproducibility — existing utilities depend on deterministic behavior given a seed.
  • Keep R-facing interface contracts stable unless you're intentionally making an API change.

Build flags

src/Makevars and src/Makevars.win set:

PKG_CXXFLAGS = -Wno-ignored-attributes
PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)

PKG_CXXFLAGS suppresses an Eigen-specific warning on newer compilers while leaving all other warnings active.

PKG_LIBS is required for RcppArmadillo linear algebra calls that use BLAS/LAPACK routines. On Windows, omitting this line can produce linker errors involving symbols such as dgemm_, dgemv, dsyev_, or related BLAS/LAPACK routines.

Add new flags here if needed.


Rcpp registration and package namespace

The package should include the generated Rcpp files:

R/RcppExports.R
src/RcppExports.cpp

These are generated by:

Rcpp::compileAttributes(".")

The package-level roxygen file R/package.R should include:

#' MMLN: Mixed Effects Multinomial Logistic Normal Models
#'
#' @useDynLib MMLN, .registration = TRUE
#' @importFrom Rcpp evalCpp
"_PACKAGE"

After running devtools::document("."), confirm that NAMESPACE contains:

useDynLib(MMLN, .registration = TRUE)
importFrom(Rcpp,evalCpp)

If the compiled DLL is not loaded after library(MMLN), verify_installation() and other Rcpp-backed functions may fail.


Validating C++ Changes

When modifying any C++ path, work through all four of these before opening a PR:

  1. Correctness — compare outputs to prior behavior on fixed seeds; verify floating-point differences stay within expected tolerances (~1e-8 per operation, ~1e-6 over 1000 iterations).
  2. Performance — profile before and after on the same input and seed; record where hotspots moved.
  3. Stability — run multiple chains on representative datasets; confirm acceptance rates and diagnostics still look reasonable.
  4. Package checkdevtools::check() with no new warnings from your code path.

Profiling

Use profvis for interactive hotspot analysis:

library(profvis)
library(MMLN)

prof <- profvis({
  set.seed(1)
  sim <- simulate_mixed_mln_data(
    m = 8, n_i = 8, p = 3, d = 2,
    beta = matrix(c(0.5, -1, 0.2, 0.3, 0.7, -0.4), 3, 2),
    Sigma = diag(2),
    Phi = 5 * diag(2),
    n_mean = 100
  )

  fit <- FMLN(
    Y = sim$Y,
    X = sim$X,
    n_iter = 300,
    burn_in = 100,
    thin = 2,
    proposal = "normbeta",
    verbose = FALSE
  )
})

prof

Tips:

  • Warm up once before profiling to avoid first-run overhead.
  • Profile realistic workloads — small inputs can hide true bottlenecks.
  • Save profile artifacts to profiling/ with descriptive names.

Helpful Tools

Tool Purpose
profvis Interactive flame graph profiling in RStudio
bench Mid-size benchmarking with memory tracking
microbenchmark Fast comparative timing for small kernels
lintr R style and static analysis
goodpractice Package-level quality checks
covr Test coverage reports
valgrind (Linux/macOS) Native memory diagnostics for C++
gdb / lldb Step-level native debugging for C++ crashes

Benchmark pattern with bench:

library(bench)

mark(
  baseline = {
    # old path / behavior
  },
  candidate = {
    # new path / behavior
  },
  iterations = 20,
  check = FALSE
)

Pre-PR Checklist

  • Code compiles cleanly on your platform
  • Rcpp::compileAttributes(".") run if any exported C++ functions changed
  • devtools::document(".") run if any exported functions, roxygen docs, or namespace directives changed
  • devtools::install(".", force = TRUE, upgrade = "never", dependencies = FALSE) succeeds
  • After restarting R, library(MMLN) and verify_installation() both work
  • Scripts in testing/ still run end-to-end
  • C++ changes validated for correctness, performance, and stability
  • devtools::check() passes with no new warnings

Numerical Differences

Small cross-platform numeric differences are expected with compiled C++ due to compiler and BLAS/LAPACK variation. Treat differences below ~1e-6 as normal unless they materially affect diagnostics, rankings, or substantive inference.