Skip to content

perf: avoid deep copies in comparisons and hot paths#1150

Merged
henryiii merged 1 commit into
developfrom
perf/avoid-copies-dead-code
Jun 11, 2026
Merged

perf: avoid deep copies in comparisons and hot paths#1150
henryiii merged 1 commit into
developfrom
perf/avoid-copies-dead-code

Conversation

@henryiii

@henryiii henryiii commented Jun 11, 2026

Copy link
Copy Markdown
Member

🤖 AI text below 🤖

Part of #1143

Removes hidden deep copies on hot paths. No behavior changes intended; the full test suite, mypy strict, and pre-commit all pass.

Changes

  • C1__eq__/__ne__ lambdas in register_histogram.hpp (both the generic and multi_cell registrations), register_axis.hpp, register_storage.hpp (all three specializations), and register_accumulator.hpp used py::cast<T>(other), which casts by value and deep-copied the entire histogram/axis/storage/accumulator on every comparison. Now py::cast<const T&>(other); reference casts raise py::cast_error for foreign types exactly the same, so the False/True fallback behavior is unchanged.
  • C3Histogram.to_numpy called the C++ to_numpy helper, which builds a full copy of the bin contents that the Python code immediately discarded (it only kept the edges). The edges are now computed in Python from the per-axis C++ edges property plus the same flow values and nextafter upper-edge nudge. The convention is replicated exactly (including category axes' 0..size edges and the regular_none/regular_uflow/regular_numpy no-nudge exceptions) and locked in by a new parametrized test that compares against the still-present C++ helper for every axis flavor, with flow=False and flow=True.
  • C4tuple(_histograms) was rebuilt on every isinstance check in __init__, _clone, __itruediv__, and _compute_inplace_op. The set is fixed at import time (only @register reads it), so it is now hoisted into a module-level _histogram_types tuple. Also, _handle_slice now returns early for full slices [:] before calling _process_loc, whose results were unused in that case.
  • C5serialization.to_uhi accessed the storage_type property four times (each access does a subclass-walking cast) and instantiated the class four times. It now looks the type up once and reuses a single instance (h.storage); as a side effect, the unsupported-storage error for MultiCell now reports the real nelem instead of a default-constructed instance.

Testing: pytest -n auto --benchmark-disable (1055 passed, 1 xfailed), mypy strict (no issues in 32 files), prek -a clean. A new regression test (test_to_numpy_edges_match_cpp_helper) pins the to_numpy edge convention against the C++ implementation across all axis types, including the bh.numpy regular_numpy axis.

🤖 Generated with Claude Code

* Cast `other` by const reference in the C++ __eq__/__ne__ lambdas for
  histograms, axes, storages, and accumulators; py::cast<T> casts by
  value, deep-copying the whole object on every comparison.
  py::cast<const T&> raises py::cast_error for foreign types just the
  same, so the False/True fallback is unchanged.
* Compute Histogram.to_numpy edges in Python instead of calling the C++
  to_numpy helper, which built a full copy of the bin contents that was
  immediately thrown away. The NumPy upper-edge convention (nextafter
  nudge, flow edges, category 0..size edges) is replicated exactly and
  locked in by a new test comparing against the C++ helper.
* Hoist tuple(_histograms) into a module-level constant instead of
  rebuilding the tuple on every isinstance check; the set is fixed at
  import time.
* Return early in _handle_slice for full slices before calling
  _process_loc, whose results are unused in that case.
* Look up storage_type once in serialization.to_uhi and reuse one
  storage instance; every property access walks subclasses, and the
  code instantiated the class four times. Error messages for
  unsupported MultiCell storages now report the real nelem.

Part of #1143

Assisted-by: ClaudeCode:claude-fable-5
@github-actions github-actions Bot added the needs changelog Might need a changelog entry label Jun 11, 2026
@henryiii henryiii force-pushed the perf/avoid-copies-dead-code branch from 28aebd1 to 873fc30 Compare June 11, 2026 17:37
@henryiii henryiii changed the title perf: avoid deep copies in comparisons and hot paths; remove dead code perf: avoid deep copies in comparisons and hot paths Jun 11, 2026
@henryiii henryiii marked this pull request as ready for review June 11, 2026 18:32
@henryiii henryiii merged commit ca4aac7 into develop Jun 11, 2026
21 checks passed
@henryiii henryiii deleted the perf/avoid-copies-dead-code branch June 11, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Might need a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant