Fix backwards comparison in brunner_munzel permutation test causing inflated p-values by Copilot · Pull Request #110 · Lakens/TOSTER

Copilot · 2026-01-26T14:38:29Z

Issue

Permutation test p-values in brunner_munzel(test_method = "perm") were systematically inflated ~2x for completely separated samples due to inverted comparison operators counting non-extreme permutations instead of extreme ones.

brunner_munzel(1:3, 4:6, test_method = "perm")
# Expected p-value: 2/choose(6,3) = 0.10
# Actual (buggy): 0.20

Changes

Code fix (R/brunner_munzel.R:751-752):

# Before: counted wrong direction
b_less <- sum(test_stat <= Tperm[1,])
b_greater <- sum(test_stat >= Tperm[1,])

# After: count extreme permutations correctly  
b_less <- sum(Tperm[1,] <= test_stat)
b_greater <- sum(Tperm[1,] >= test_stat)

Regression tests (tests/testthat/test-htest.R):

Completely separated samples (both directions)
Two-sided and one-sided alternatives
Validates p-values and effect size estimates

Impact

Affects two-sided permutation tests on small samples with extreme effects. Causes conservative bias (reduced power). Unpaired tests only—paired tests use different code path and are unaffected.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

cloud.r-project.org
- Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save -e ~n~install.packages(c('testthat',~+~'ggplot2',~+~'ggdist',~+~'distributional',~+~'cowplot',~+~'tidyr',~+~'R6',~+~'lifecycle'),~+~repos='REDACTED',~+~Ncpus=4)~n~ (dns block)
cran.posit.co
- Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save dpkg-split -Qao�� all.deb .0-1build2_amd64.deb eb ll.deb 1.14-1_all.deb eb rm -rf stall-haIRnB/707-r-cran-riskregression_2023.12.21+ds-1_amd64.deb p.ci plit stall-haIRnB/507/usr/bin/dpkg b plit dpkg-deb (dns block)
cran.r-project.org
- Triggering command: /usr/lib/R/bin/exec/R /usr/lib/R/bin/exec/R --slave --no-save -e ~n~install.packages(c('testthat',~+~'ggplot2',~+~'ggdist',~+~'distributional',~+~'cowplot',~+~'tidyr',~+~'R6',~+~'lifecycle'),~+~repos='REDACTED',~+~Ncpus=4)~n~ (dns block)
esm.ubuntu.com
- Triggering command: /usr/lib/apt/methods/https /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>Potential issue: Inflated p-values in brunner_munzel(test_method = "perm") for completely separated small samples</issue_title>
<issue_description>Hi, thank you very much for developing and maintaining the excellent TOSTER package—it's been really helpful for my work on equivalence testing and robust statistics!

I noticed a potential issue with brunner_munzel(test_method = "perm") when data are completely separated in small samples. The p-value appears systematically inflated compared to the exact permutation p-value.

Reproducible example:
TOSTER::brunner_munzel(1:3, 4:6, test_method = "perm")
Output (on my system):
Two-sample Brunner-Munzel permutation test

data: 1:3 and 4:6
t-observed = -2.1209, df = 1000, p-value = 0.2025
alternative hypothesis: true relative effect is not equal to 0.5
95 percent confidence interval:
0 1
sample estimates:
P(X>Y) + .5*P(X=Y)
1e-04

For completely separated data (all x < all y), the exact two-sided permutation p-value should be 2 / choose(6,3) = 2/20 = 0.1.
The observed p-value is much higer than expected value.
The asymptotic version works correctly, and the package is otherwise great. Just wanted to kindly bring this to your attention in case it's an unintended behavior in the Monte Carlo permutation implementation.
Thanks again for your awesome work!
TOSTER version: [0.9.0]
R version: [4.5.1]
</issue_description>

<agent_instructions>Create a branch to look into issue, give an initial report, and assess if this is truly a bug or a misinterpretation of by the user raising the issue.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Potential issue: Inflated p-values in brunner_munzel(test_method = "perm") for completely separated small samples #109

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: arcaldwell49 <24322628+arcaldwell49@users.noreply.github.com>

- Fix permutation loop, causing error with variance calc - Deprecate simple_htest interface with brunner_munzel - Updated output for brunner_munzel, parameter changed to N-permutations and labeling for sample estimate more verbose/accurate. - added file to junk folder comparing my implementation to nparcomp

Copilot

Pull request overview

This PR fixes a critical bug in the Brunner-Munzel permutation test where inverted comparison operators caused inflated p-values (approximately doubled) for completely separated samples. The issue specifically affected brunner_munzel(test_method = "perm").

Changes:

Fixed backwards comparison operators in permutation p-value calculation (lines 777-778 in R/brunner_munzel.R)
Changed two-sided p-value formula from doubling method to absolute value method for consistency with reference implementations
Improved unique allocation sampling to avoid duplicate permutations in small samples
Enhanced output labels to use actual group names instead of generic "X" and "Y"
Added comprehensive regression tests validating the fix against known reference values
Deprecated brunner_munzel option in simple_htest() to encourage direct use of brunner_munzel()

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
R/brunner_munzel.R	Core fix for comparison operators (lines 777-778), improved permutation sampling (lines 189-229), consistent zero-variance handling, relative effect convention fix in perm_loop (line 1064), enhanced estimate labeling with actual group names, and improved parameter reporting for permutation tests
tests/testthat/test-htest.R	Comprehensive regression tests for completely separated samples, one-sided and two-sided alternatives, validation against brunnermunzel R package reference values
R/simple_htest.R	Deprecation warning for brunner_munzel test option with guidance to use brunner_munzel() directly, updated documentation removing brunner_munzel from main options
junk/validate_brunner_munzel_vs_nparcomp.R	Development validation script comparing TOSTER implementation against nparcomp package across multiple scenarios

R/brunner_munzel.R

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

arcaldwell49

Changes successfully have resolved issue. Unit tests added to avoid issue in the future.

arcaldwell49 · 2026-01-26T20:26:00Z

Issue with R-CMD-Check appears to be a network error. Merging despite failing check.

Initial plan

c39b27a

Copilot AI assigned Copilot and arcaldwell49 Jan 26, 2026

Copilot started work on behalf of arcaldwell49 January 26, 2026 14:38 View session

Fix permutation test comparison operators and add regression test

5f06237

Co-authored-by: arcaldwell49 <24322628+arcaldwell49@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix inflated p-values in brunner_munzel for small samples~~ Fix backwards comparison in brunner_munzel permutation test causing inflated p-values Jan 26, 2026

Copilot AI requested a review from arcaldwell49 January 26, 2026 14:53

Copilot finished work on behalf of arcaldwell49 January 26, 2026 14:53

arcaldwell49 requested a review from Copilot January 26, 2026 19:08

Copilot started reviewing on behalf of arcaldwell49 January 26, 2026 19:08 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

R/brunner_munzel.R Outdated Show resolved Hide resolved

arcaldwell49 and others added 2 commits January 26, 2026 13:59

Update namespace

6d87620

Update R/brunner_munzel.R

a9b5b8b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

arcaldwell49 marked this pull request as ready for review January 26, 2026 20:22

arcaldwell49 approved these changes Jan 26, 2026

View reviewed changes

arcaldwell49 merged commit 5a2eaf5 into master Jan 26, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix backwards comparison in brunner_munzel permutation test causing inflated p-values#110

Fix backwards comparison in brunner_munzel permutation test causing inflated p-values#110
arcaldwell49 merged 5 commits intomasterfrom
copilot/fix-inflated-p-values

Copilot AI commented Jan 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

arcaldwell49 left a comment

Uh oh!

arcaldwell49 commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Changes

Impact

I tried to connect to the following addresses, but was blocked by firewall rules:

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

arcaldwell49 left a comment

Choose a reason for hiding this comment

Uh oh!

arcaldwell49 commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 26, 2026 •

edited

Loading