Add Hypothesis property-based tests (#1038) by etal · Pull Request #1042 · etal/cnvkit

etal · 2026-03-11T08:47:10Z

Summary

Adds 42 Hypothesis property-based tests covering mathematical invariants (equivariance, boundedness, round-trips) for descriptives.py, smoothing.py, and call.py
Fixes a bug in weighted_median where a signed comparison always evaluated True, causing incorrect averaging of adjacent values
Adds hypothesis[numpy] to test dependencies and configures pytest markers/settings

Test plan

All 198 tests pass (42 new + 156 existing)
mypy clean on cnvlib/descriptives.py
Pre-commit hooks pass (ruff, ruff-format, bandit)
CI passes on Python 3.10–3.14

🤖 Generated with Claude Code

42 property tests covering mathematical invariants (equivariance, boundedness, round-trips) for descriptives.py, smoothing.py, and call.py. Also fixes a bug in weighted_median where a signed comparison (cumulative_weight[idx-1] - midpoint < epsilon) always evaluated True, causing incorrect averaging of adjacent values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

etal · 2026-03-11T08:48:15Z

@claude Please review.

claude · 2026-03-11T08:48:27Z

Claude finished @etal's task —— View job

PR Review Complete ✅

Reviewing PR #1042: Add Hypothesis property-based tests

Todo List:

Read and analyze all changed files
Review the new property-based tests in test/test_properties.py
Examine the bug fix in cnvlib/descriptives.py
Check dependency and configuration changes
Provide comprehensive review feedback

Summary

This is an excellent PR that adds robust property-based testing using Hypothesis and fixes a critical bug. The implementation demonstrates strong understanding of mathematical invariants and thorough testing practices.

✅ Strengths

1. Critical Bug Fix (`cnvlib/descriptives.py:156-163`)

The fix in weighted_median addresses a serious logic error:

Before: cumulative_weight[midpoint_idx - 1] - midpoint < epsilon (signed comparison)
After: abs(cumulative_weight[midpoint_idx] - midpoint) < epsilon (absolute difference)

The original code could incorrectly trigger averaging when the cumulative weight was less than the midpoint, leading to wrong median calculations. The fix correctly checks if we're exactly at the midpoint.

2. Comprehensive Property-Based Tests (`test/test_properties.py`)

42 well-designed tests covering mathematical invariants:

Robust Statistical Functions:

biweight_location: shift/scale equivariance, boundedness
biweight_midvariance: non-negativity, scale equivariance
median_absolute_deviation: shift invariance, scale equivariance, scaling constant verification
interquartile_range: basic statistical properties
weighted_median: uniform weights equivalence, bounds checking
gapper_scale, q_n: scale equivariance testing

Smoothing Functions:

rolling_median, kaiser, savgol: output length preservation, boundedness, NaN handling
_width2wing: boundary checking for window size calculations

Copy Number Arithmetic:

_log2_ratio_to_absolute_pure: round-trips, monotonicity, positive outputs
_log2_ratio_to_absolute: purity correction validation
rescale_baf: B-allele frequency mixture model round-trips

3. Smart Test Design

Appropriate tolerances: Accounts for iterative convergence in biweight estimators
Edge case coverage: Constant arrays, boundary conditions, insufficient data
Realistic data ranges: CNV-appropriate log2 ratios (-8 to 8)
Performance tuning: Different Hypothesis profiles for CI vs. local development

4. Clean Configuration

Proper hypothesis[numpy] dependency addition
Well-configured pytest markers for selective test execution
Appropriate .hypothesis/ cache directory in .gitignore

🔍 Minor Observations

Type Annotations

The test file uses modern Python type syntax consistently with the project (e.g., list[int] not List[int]).

Test Organization

The class-based organization by function is clean and makes the 509-line file very navigable.

Strategy Design

The Hypothesis strategies are well-thought-out:

_non_constant_array() prevents degenerate cases
_weighted_pair() ensures equal-length arrays
Appropriate floating-point ranges for genomic data

🎯 Recommendations

Consider adding a regression test specifically for the weighted_median bug to ensure it doesn't reoccur
The biweight functions use min_size=20 for stable convergence - consider documenting this constraint in the function docstrings
Property-based testing could be expanded to other modules like cnvlib/segmentation/ in future work

🚀 Impact

This PR significantly improves CNVkit's reliability:

Fixes potential silent errors in weighted median calculations
Establishes property-based testing patterns for the project
Validates mathematical correctness of core statistical functions
Enables continuous verification of numerical invariants

Recommendation: APPROVE ✅

The bug fix alone justifies merging, and the comprehensive test suite adds substantial long-term value. The implementation follows project conventions and demonstrates excellent engineering practices.

codecov · 2026-03-11T08:53:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.10%. Comparing base (4225b17) to head (0af0073).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1042      +/-   ##
==========================================
+ Coverage   65.68%   66.10%   +0.42%     
==========================================
  Files          72       72              
  Lines        7375     7375              
  Branches     1294     1294              
==========================================
+ Hits         4844     4875      +31     
+ Misses       2094     2061      -33     
- Partials      437      439       +2

Flag	Coverage Δ
unittests	`66.10% <100.00%> (+0.42%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add hypothesis to conda dev environment

0af0073

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

etal merged commit 71f02c3 into master Mar 11, 2026
15 checks passed

etal deleted the i1038 branch March 11, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hypothesis property-based tests (#1038)#1042

Add Hypothesis property-based tests (#1038)#1042
etal merged 2 commits intomasterfrom
i1038

etal commented Mar 11, 2026

Uh oh!

etal commented Mar 11, 2026

Uh oh!

claude bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

etal commented Mar 11, 2026

Summary

Test plan

Uh oh!

etal commented Mar 11, 2026

Uh oh!

claude bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Summary

✅ Strengths

1. Critical Bug Fix (cnvlib/descriptives.py:156-163)

2. Comprehensive Property-Based Tests (test/test_properties.py)

3. Smart Test Design

4. Clean Configuration

🔍 Minor Observations

Type Annotations

Test Organization

Strategy Design

🎯 Recommendations

🚀 Impact

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 11, 2026 •

edited

Loading

1. Critical Bug Fix (`cnvlib/descriptives.py:156-163`)

2. Comprehensive Property-Based Tests (`test/test_properties.py`)

codecov bot commented Mar 11, 2026 •

edited

Loading