Skip to content

Fix backwards comparison in brunner_munzel permutation test causing inflated p-values#110

Merged
arcaldwell49 merged 5 commits intomasterfrom
copilot/fix-inflated-p-values
Jan 26, 2026
Merged

Fix backwards comparison in brunner_munzel permutation test causing inflated p-values#110
arcaldwell49 merged 5 commits intomasterfrom
copilot/fix-inflated-p-values

Conversation

Copy link
Contributor

Copilot AI commented Jan 26, 2026

Issue

Permutation test p-values in brunner_munzel(test_method = "perm") were systematically inflated ~2x for completely separated samples due to inverted comparison operators counting non-extreme permutations instead of extreme ones.

brunner_munzel(1:3, 4:6, test_method = "perm")
# Expected p-value: 2/choose(6,3) = 0.10
# Actual (buggy): 0.20

Changes

Code fix (R/brunner_munzel.R:751-752):

# Before: counted wrong direction
b_less <- sum(test_stat <= Tperm[1,])
b_greater <- sum(test_stat >= Tperm[1,])

# After: count extreme permutations correctly  
b_less <- sum(Tperm[1,] <= test_stat)
b_greater <- sum(Tperm[1,] >= test_stat)

Regression tests (tests/testthat/test-htest.R):

  • Completely separated samples (both directions)
  • Two-sided and one-sided alternatives
  • Validates p-values and effect size estimates

Impact

Affects two-sided permutation tests on small samples with extreme effects. Causes conservative bias (reduced power). Unpaired tests only—paired tests use different code path and are unaffected.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cloud.r-project.org
    • Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save -e ~n~install.packages(c(&#39;testthat&#39;,~&#43;~&#39;ggplot2&#39;,~&#43;~&#39;ggdist&#39;,~&#43;~&#39;distributional&#39;,~&#43;~&#39;cowplot&#39;,~&#43;~&#39;tidyr&#39;,~&#43;~&#39;R6&#39;,~&#43;~&#39;lifecycle&#39;),~&#43;~repos=&#39;REDACTED&#39;,~&#43;~Ncpus=4)~n~ (dns block)
  • cran.posit.co
    • Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save dpkg-split -Qao�� all.deb .0-1build2_amd64.deb eb ll.deb 1.14-1_all.deb eb rm -rf stall-haIRnB/707-r-cran-riskregression_2023.12.21&#43;ds-1_amd64.deb p.ci plit stall-haIRnB/507/usr/bin/dpkg b plit dpkg-deb (dns block)
  • cran.r-project.org
    • Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save -e ~n~install.packages(c(&#39;testthat&#39;,~&#43;~&#39;ggplot2&#39;,~&#43;~&#39;ggdist&#39;,~&#43;~&#39;distributional&#39;,~&#43;~&#39;cowplot&#39;,~&#43;~&#39;tidyr&#39;,~&#43;~&#39;R6&#39;,~&#43;~&#39;lifecycle&#39;),~&#43;~repos=&#39;REDACTED&#39;,~&#43;~Ncpus=4)~n~ (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Potential issue: Inflated p-values in brunner_munzel(test_method = "perm") for completely separated small samples</issue_title>
<issue_description>Hi, thank you very much for developing and maintaining the excellent TOSTER package—it's been really helpful for my work on equivalence testing and robust statistics!

I noticed a potential issue with brunner_munzel(test_method = "perm") when data are completely separated in small samples. The p-value appears systematically inflated compared to the exact permutation p-value.

Reproducible example:

TOSTER::brunner_munzel(1:3, 4:6, test_method = "perm")

Output (on my system):
Two-sample Brunner-Munzel permutation test

data: 1:3 and 4:6
t-observed = -2.1209, df = 1000, p-value = 0.2025
alternative hypothesis: true relative effect is not equal to 0.5
95 percent confidence interval:
0 1
sample estimates:
P(X>Y) + .5*P(X=Y)
1e-04

For completely separated data (all x < all y), the exact two-sided permutation p-value should be 2 / choose(6,3) = 2/20 = 0.1.
The observed p-value is much higer than expected value.
The asymptotic version works correctly, and the package is otherwise great. Just wanted to kindly bring this to your attention in case it's an unintended behavior in the Monte Carlo permutation implementation.
Thanks again for your awesome work!
TOSTER version: [0.9.0]
R version: [4.5.1]
</issue_description>

<agent_instructions>Create a branch to look into issue, give an initial report, and assess if this is truly a bug or a misinterpretation of by the user raising the issue.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: arcaldwell49 <24322628+arcaldwell49@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix inflated p-values in brunner_munzel for small samples Fix backwards comparison in brunner_munzel permutation test causing inflated p-values Jan 26, 2026
Copilot AI requested a review from arcaldwell49 January 26, 2026 14:53
- Fix permutation loop, causing error with variance calc
- Deprecate simple_htest interface with brunner_munzel
- Updated output for brunner_munzel, parameter changed to N-permutations and labeling for sample estimate more verbose/accurate.
- added file to junk folder comparing my implementation to nparcomp
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the Brunner-Munzel permutation test where inverted comparison operators caused inflated p-values (approximately doubled) for completely separated samples. The issue specifically affected brunner_munzel(test_method = "perm").

Changes:

  • Fixed backwards comparison operators in permutation p-value calculation (lines 777-778 in R/brunner_munzel.R)
  • Changed two-sided p-value formula from doubling method to absolute value method for consistency with reference implementations
  • Improved unique allocation sampling to avoid duplicate permutations in small samples
  • Enhanced output labels to use actual group names instead of generic "X" and "Y"
  • Added comprehensive regression tests validating the fix against known reference values
  • Deprecated brunner_munzel option in simple_htest() to encourage direct use of brunner_munzel()

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
R/brunner_munzel.R Core fix for comparison operators (lines 777-778), improved permutation sampling (lines 189-229), consistent zero-variance handling, relative effect convention fix in perm_loop (line 1064), enhanced estimate labeling with actual group names, and improved parameter reporting for permutation tests
tests/testthat/test-htest.R Comprehensive regression tests for completely separated samples, one-sided and two-sided alternatives, validation against brunnermunzel R package reference values
R/simple_htest.R Deprecation warning for brunner_munzel test option with guidance to use brunner_munzel() directly, updated documentation removing brunner_munzel from main options
junk/validate_brunner_munzel_vs_nparcomp.R Development validation script comparing TOSTER implementation against nparcomp package across multiple scenarios

arcaldwell49 and others added 2 commits January 26, 2026 13:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@arcaldwell49 arcaldwell49 marked this pull request as ready for review January 26, 2026 20:22
Copy link
Collaborator

@arcaldwell49 arcaldwell49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes successfully have resolved issue. Unit tests added to avoid issue in the future.

@arcaldwell49
Copy link
Collaborator

Issue with R-CMD-Check appears to be a network error. Merging despite failing check.

@arcaldwell49 arcaldwell49 merged commit 5a2eaf5 into master Jan 26, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential issue: Inflated p-values in brunner_munzel(test_method = "perm") for completely separated small samples

3 participants