fix: rebin with groups — MultiCell support, flow handling, and group_mapping protocol#1148
Merged
Conversation
…mapping protocol Fixes three verified findings from #1143: B6a: _rebin_with_groups crashed with IndexError on MultiCell storage because the new histogram's cell count was never set (reset_nelem) and the group-combining positional index did not account for the leading cell dimension of MultiCell views. B6b: when the old axis has flow bins and the user-supplied new axis does not, the old underflow bin was folded into the first group, the underflow-to-overflow combine ran once per group instead of once, and the old overflow bin consumed a shifted regular bin. The loop now skips the old underflow bin when the new axis lacks one, folds it into the new overflow exactly once (only for unordered/categorical axes, where every out-of-range entry lands in overflow), and drops flow contents when the new axis has no matching flow bin. B7: rebin.group_mapping returned [factor] * len(axis) for the factor case, violating the sum(groups) == len(axis) invariant it enforces for explicit groups. It now returns len(axis) // factor groups of size factor; remainder bins are not part of any group, matching the C++ factor rebinning which merges them into overflow. rebin(factor, axis=...) now raises ValueError instead of silently ignoring the axis. Assisted-by: ClaudeCode:claude-fable-5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 AI text below 🤖
Part of #1143.
This fixes three verified findings in rebinning with groups (B6, B7):
B6a — MultiCell crash
h[bh.rebin(groups=[2, 2])]on aMultiCellhistogram raisedIndexError: index 0 is out of bounds for axis 0 with size 0._rebin_with_groupsbuilt the new C++ histogram withreduced.__class__(axes), which does not preserve the cell count, and_combine_group_contentsindexed the wrong view dimension because MultiCell views carry the cell index as the leading dimension. Both are now handled the same way as the existingpick_eachpath:reset_nelem()is called on the new histogram, and the per-axis position within the view is offset by one for MultiCell storage.B6b — flow mishandling when rebinning onto an axis without flow bins
When the old axis has flow bins and the user-supplied
new_axisdoes not (e.g.h[bh.rebin(groups=[2, 2], axis=bh.axis.IntCategory([0, 1]))]):For
Regular(4)with flow view[100, 1, 2, 3, 4, 200]andgroups=[2, 2]ontoIntCategory([0, 1]), the result was[101, 5, 304]; it is now[3, 7, 300].Flow-handling semantics implemented
For each side (underflow and overflow), all combinations of old/new axis traits are now handled:
The pre-existing
test_rebin_change_axis_catexpectation encoded the buggy once-per-group behavior ([1, 3, 4]) and was updated to the correct value ([1, 3, 2]).B7 —
rebin.group_mappingprotocol violationThe factor form returned
[factor] * len(axis)— group sizes summing tofactor * len(axis)— violating thesum(groups) == len(axis)invariant the same method enforces for explicit groups. This is part of the publicRebinProtocolsurface consumed cross-library. It now returnslen(axis) // factorgroups of sizefactor; when the factor does not divide the bin count evenly, the remainder bins are not part of any group, matching the C++ factor rebinning (and the UHIrebinspec), which merges leftover bins into the overflow bin. The choice is documented in the docstring. (Internal code is unaffected: the factor path is matched beforegroup_mappingin_handle_sliceand goes through the C++slice_and_rebin.)Additionally,
rebin(2, axis=...)silently ignored the axis; the constructor now raisesValueErrorsince a factor combined with a replacement axis has no defined meaning (usegroups=/edges=withaxis=instead).groups/edgeswithaxis=keep working as before.Tests
group_mappinginvariant (sum(groups) == len(axis)), factor-with-remainder behavior, and consistency with the C++ factor rebinrebin(factor, axis=...)raisingValueError🤖 Generated with Claude Code