perf: avoid deep copies in comparisons and hot paths#1150
Merged
Conversation
* Cast `other` by const reference in the C++ __eq__/__ne__ lambdas for histograms, axes, storages, and accumulators; py::cast<T> casts by value, deep-copying the whole object on every comparison. py::cast<const T&> raises py::cast_error for foreign types just the same, so the False/True fallback is unchanged. * Compute Histogram.to_numpy edges in Python instead of calling the C++ to_numpy helper, which built a full copy of the bin contents that was immediately thrown away. The NumPy upper-edge convention (nextafter nudge, flow edges, category 0..size edges) is replicated exactly and locked in by a new test comparing against the C++ helper. * Hoist tuple(_histograms) into a module-level constant instead of rebuilding the tuple on every isinstance check; the set is fixed at import time. * Return early in _handle_slice for full slices before calling _process_loc, whose results are unused in that case. * Look up storage_type once in serialization.to_uhi and reuse one storage instance; every property access walks subclasses, and the code instantiated the class four times. Error messages for unsupported MultiCell storages now report the real nelem. Part of #1143 Assisted-by: ClaudeCode:claude-fable-5
38 tasks
28aebd1 to
873fc30
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 AI text below 🤖
Part of #1143
Removes hidden deep copies on hot paths. No behavior changes intended; the full test suite, mypy strict, and pre-commit all pass.
Changes
__eq__/__ne__lambdas inregister_histogram.hpp(both the generic and multi_cell registrations),register_axis.hpp,register_storage.hpp(all three specializations), andregister_accumulator.hppusedpy::cast<T>(other), which casts by value and deep-copied the entire histogram/axis/storage/accumulator on every comparison. Nowpy::cast<const T&>(other); reference casts raisepy::cast_errorfor foreign types exactly the same, so theFalse/Truefallback behavior is unchanged.Histogram.to_numpycalled the C++to_numpyhelper, which builds a full copy of the bin contents that the Python code immediately discarded (it only kept the edges). The edges are now computed in Python from the per-axis C++edgesproperty plus the same flow values andnextafterupper-edge nudge. The convention is replicated exactly (including category axes'0..sizeedges and the regular_none/regular_uflow/regular_numpy no-nudge exceptions) and locked in by a new parametrized test that compares against the still-present C++ helper for every axis flavor, withflow=Falseandflow=True.tuple(_histograms)was rebuilt on every isinstance check in__init__,_clone,__itruediv__, and_compute_inplace_op. The set is fixed at import time (only@registerreads it), so it is now hoisted into a module-level_histogram_typestuple. Also,_handle_slicenow returns early for full slices[:]before calling_process_loc, whose results were unused in that case.serialization.to_uhiaccessed thestorage_typeproperty four times (each access does a subclass-walking cast) and instantiated the class four times. It now looks the type up once and reuses a single instance (h.storage); as a side effect, the unsupported-storage error for MultiCell now reports the realneleminstead of a default-constructed instance.Testing:
pytest -n auto --benchmark-disable(1055 passed, 1 xfailed),mypystrict (no issues in 32 files),prek -aclean. A new regression test (test_to_numpy_edges_match_cpp_helper) pins the to_numpy edge convention against the C++ implementation across all axis types, including thebh.numpyregular_numpyaxis.🤖 Generated with Claude Code