Skip to content

fix: rebin with groups — MultiCell support, flow handling, and group_mapping protocol#1148

Merged
henryiii merged 1 commit into
developfrom
fix/rebin-groups
Jun 11, 2026
Merged

fix: rebin with groups — MultiCell support, flow handling, and group_mapping protocol#1148
henryiii merged 1 commit into
developfrom
fix/rebin-groups

Conversation

@henryiii

Copy link
Copy Markdown
Member

🤖 AI text below 🤖

Part of #1143.

This fixes three verified findings in rebinning with groups (B6, B7):

B6a — MultiCell crash

h[bh.rebin(groups=[2, 2])] on a MultiCell histogram raised IndexError: index 0 is out of bounds for axis 0 with size 0. _rebin_with_groups built the new C++ histogram with reduced.__class__(axes), which does not preserve the cell count, and _combine_group_contents indexed the wrong view dimension because MultiCell views carry the cell index as the leading dimension. Both are now handled the same way as the existing pick_each path: reset_nelem() is called on the new histogram, and the per-axis position within the view is offset by one for MultiCell storage.

B6b — flow mishandling when rebinning onto an axis without flow bins

When the old axis has flow bins and the user-supplied new_axis does not (e.g. h[bh.rebin(groups=[2, 2], axis=bh.axis.IntCategory([0, 1]))]):

  • the old underflow bin was folded into the first group,
  • the underflow→overflow combine ran once per group instead of once, and
  • the trailing overflow group consumed a shifted regular bin instead of the old overflow bin.

For Regular(4) with flow view [100, 1, 2, 3, 4, 200] and groups=[2, 2] onto IntCategory([0, 1]), the result was [101, 5, 304]; it is now [3, 7, 300].

Flow-handling semantics implemented

For each side (underflow and overflow), all combinations of old/new axis traits are now handled:

old flow bin new flow bin behavior
yes yes carried over unchanged (1-bin group)
yes no (ordered axis) dropped, matching reduce/crop semantics
yes (underflow) no, but new axis is unordered/categorical with overflow folded into the new overflow bin exactly once, since every out-of-range entry on a categorical axis lands in overflow
no yes new flow bin stays empty

The pre-existing test_rebin_change_axis_cat expectation encoded the buggy once-per-group behavior ([1, 3, 4]) and was updated to the correct value ([1, 3, 2]).

B7 — rebin.group_mapping protocol violation

The factor form returned [factor] * len(axis) — group sizes summing to factor * len(axis) — violating the sum(groups) == len(axis) invariant the same method enforces for explicit groups. This is part of the public RebinProtocol surface consumed cross-library. It now returns len(axis) // factor groups of size factor; when the factor does not divide the bin count evenly, the remainder bins are not part of any group, matching the C++ factor rebinning (and the UHI rebin spec), which merges leftover bins into the overflow bin. The choice is documented in the docstring. (Internal code is unaffected: the factor path is matched before group_mapping in _handle_slice and goes through the C++ slice_and_rebin.)

Additionally, rebin(2, axis=...) silently ignored the axis; the constructor now raises ValueError since a factor combined with a replacement axis has no defined meaning (use groups=/edges= with axis= instead). groups/edges with axis= keep working as before.

Tests

  • MultiCell groups rebin (1D with flow bins, and 2D rebinning a non-leading axis)
  • the exact flow-to-noflow repro from the issue, plus all constructible (old flow) × (new flow) combinations for both underflow and overflow
  • group_mapping invariant (sum(groups) == len(axis)), factor-with-remainder behavior, and consistency with the C++ factor rebin
  • rebin(factor, axis=...) raising ValueError

🤖 Generated with Claude Code

…mapping protocol

Fixes three verified findings from #1143:

B6a: _rebin_with_groups crashed with IndexError on MultiCell storage
because the new histogram's cell count was never set (reset_nelem) and
the group-combining positional index did not account for the leading
cell dimension of MultiCell views.

B6b: when the old axis has flow bins and the user-supplied new axis does
not, the old underflow bin was folded into the first group, the
underflow-to-overflow combine ran once per group instead of once, and
the old overflow bin consumed a shifted regular bin. The loop now skips
the old underflow bin when the new axis lacks one, folds it into the new
overflow exactly once (only for unordered/categorical axes, where every
out-of-range entry lands in overflow), and drops flow contents when the
new axis has no matching flow bin.

B7: rebin.group_mapping returned [factor] * len(axis) for the factor
case, violating the sum(groups) == len(axis) invariant it enforces for
explicit groups. It now returns len(axis) // factor groups of size
factor; remainder bins are not part of any group, matching the C++
factor rebinning which merges them into overflow. rebin(factor, axis=...)
now raises ValueError instead of silently ignoring the axis.

Assisted-by: ClaudeCode:claude-fable-5
@github-actions github-actions Bot added the needs changelog Might need a changelog entry label Jun 11, 2026
@henryiii henryiii marked this pull request as ready for review June 11, 2026 18:53
@henryiii henryiii merged commit 16025e9 into develop Jun 11, 2026
37 of 39 checks passed
@henryiii henryiii deleted the fix/rebin-groups branch June 11, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Might need a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant