feat: add Collector storage that keeps original per-bin values#1133
Open
henryiii wants to merge 3 commits into
Open
feat: add Collector storage that keeps original per-bin values#1133henryiii wants to merge 3 commits into
henryiii wants to merge 3 commits into
Conversation
Adds bh.storage.Collector, which stores the original sample values that fall into each bin (a variable-length list per bin) instead of aggregating them. It is built on Boost's vendored accumulators::collector<std::vector<double>> in a dense_storage and is filled via h.fill(x, sample=values). Because the per-bin data is ragged, view() returns a NumPy object-dtype array of per-bin float64 copies (no buffer protocol). Operations that go through the C++ backend work and concatenate (indexing, project, slicing/factor-rebin reduce, +/+=, sum, pickle, copy). Operations that would need to write back through the copy-only view raise NotImplementedError (item assignment, array/scalar arithmetic, group rebinning, pick-on-subset, list selection); weighted and threaded fills also raise. Pickle serializes a per-bin counts array plus a flat values array, which also seeds a future Awkward conversion (offsets + content). Assisted-by: ClaudeCode:claude-opus-4.8
Mark the locals in make_object_view and the collector sum() result const, as required by clang-tidy --warnings-as-errors. Assisted-by: ClaudeCode:claude-opus-4.8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 AI text below 🤖
Summary
Adds
bh.storage.Collector, a storage that keeps the original sample values that fall into each bin (a variable-length list per bin) instead of aggregating them. This finishes the idea started in #378 — the C++accumulators::collector<>it builds on has since been upstreamed into Boost.Histogram and is already vendored inextern/, so this PR is the Python binding + integration.Each value in
sample=is appended to the bin chosen by the corresponding coordinate.Design
dense_storage<accumulators::collector<std::vector<double>>>(double only), filled through a newfill_imploverload for the collector'saccumulator_traits_holder<false, const double&>(none of the existing overloads matched).view()returns a NumPy object-dtype array (shape = axes), each element a 1-Dfloat64copy of that bin's values. A dedicatedregister_histogramspecialization replacesdef_bufferwith anbh::indexed-based object-array builder.__getitem__,project, slicing/factor-rebinreduce,h1 + h2/+=,sum, pickle, copy/deepcopy.NotImplementedError— the object view returns copies, so anything that writes back through it is unsupported: item assignment, array/scalar arithmetic (h * 2), group rebinning, integer picking on a subset of axes, and list-based selection. Weighted and threaded fills raise too.to_awkward()would need.Out of scope / follow-ups
doubleelement types.Not UHI-serializable, consistent with
MultiCell.Tests
tests/test_collector.py(23 tests): fill, object-array view incl. flow, scalar-arg broadcast, 2D, indexing,sum/project/reduce/factor-rebin concatenation, addition, pickle round-trip, copy/deepcopy, reset, equality, structural match, and all the unsupported-op guards.prek -aclean (ruff, clang-format, mypy).