[DataOriented] Fastcache, perf, pruning by hughperkins · Pull Request #705 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-05-18T08:47:16Z

Summary

Stacks onto #704. Follow-up fixes uncovered while migrating Genesis RigidSolver to @qd.data_oriented and benchmarking against real workloads.

Core refactor (latest)

Fastcache opaque-member silencing is now the default, not an opt-in via stable_members=True.

The previous design used @qd.data_oriented(stable_members=True) (or _qd_stable_members = True) to tell the fastcache hasher to silently skip opaque-typed members (UID identifiers, Pydantic BaseModel, back-pointers up the object graph). Without the opt-in, any unrecognised member type disabled fastcache for the whole call — adding a UUID to self silently killed fastcache.

That contract was brittle. The actual invariant: opaque Python types cannot affect kernel codegen because the kernel cannot read non-recognised Python types. Only ndarrays, primitives, enums, dataclasses, and nested @qd.data_oriented objects are readable by kernel code. So all container walkers (dataclass_to_repr and the data_oriented branch in stringify_obj_type) can safely skip opaque members from the hash, no opt-in needed.

Recognised-but-unsupported types (qd.field / qd.Matrix.field) are distinct — their shape/dtype affect kernel codegen but fastcache doesn't yet know how to hash them. These still disable fastcache for the whole call (unchanged).

Top-level opaque args still emit [FASTCACHE][PARAM_INVALID] (unchanged).

Implementation: stringify_obj_type now takes nested: bool that suppresses the warning for nested opaque types; container walkers pass nested=True. Removed global _skip_unknown_warn_depth counter and _hit_recognised_unsupported flag — replaced with a clean _FAIL_FASTCACHE sentinel distinct from None (opaque/silent-skip).

The @qd.data_oriented(stable_members=True) flag and _qd_stable_members attribute remain but their scope is narrowed: they only gate the launch-time perf optimization in _template_mapper and Kernel.launch_kernel. They no longer affect fastcache.

Other fixes

Walker robustness: cycle-safe _build_struct_nd_paths and _walk_obj (seen: set[id]); is_data_oriented walks type(obj).__mro__ via __dict__ so Pydantic's ModelMetaclass.__getattr__ doesn't blow the stack; mirror MRO-safe walk in is_dataclass_instance and use it everywhere the kernel pipeline tests user values for dataclass-ness.
Fastcache hasher: skip QuadrantsCallable / BoundQuadrantsCallable entries cached on instance.__dict__ so they don't poison hash keys.
@qd.func dataclass-arg from data_oriented self: structural fix so methods on @qd.data_oriented classes can call @qd.funcs passing dataclass-instance args.

Perf

TemplateMapper.lookup: only walk template-slot args, with per-class cached classification, recovering the ~15% CPU bs=0 regression in bench_cluster_wandb.py.
Pruning: leverage existing 2-pass pruning machinery (pruning.used_struct_ndarray_ids, pass_0_ran) to prune unused @qd.data_oriented ndarrays.

Tests

Nested dataclasses + chained @qd.func calls from data_oriented self.
Walker regression tests: Pydantic-like metaclass, cyclic attribute graph, polymorphic attr across instances.
New: opaque-member-silenced-by-default regression tests (test_args_hasher_data_oriented_with_opaque_member_silently_skipped, test_args_hasher_data_oriented_nested_field_still_fails, test_args_hasher_dataclass_with_opaque_field_silently_skipped).

Docs

compound_types.md: new ### Fastcache section with the three-bucket type classification (recognised+valid / opaque / recognised+unsupported); ### stable_members=True subsection narrowed to launch-perf scope.
fastcache.md: compound-type-cache-keying rules updated with the opaque-skip bullet and the opaque-vs-recognised-unsupported note.

Test plan

Quadrants tests/python fastcache or data_oriented or py_dataclass or pure passes locally on x64 (135 passed, 1 skip, 3 xfail).
test_args_hasher.py passes locally on x64 (30 passed, including 3 new).
pre-commit run -a clean.
pyright clean for changed files.
Quadrants CI green (this PR's checks).
Genesis cluster tests green with this Quadrants.

Made with Cursor

Baseline state of branch is the perf-mitigations work (cache bound callable, opt-in stable_members short-circuit for spec-key and args-hash walks, skip per-call _BoundedDifferentiableMethod alloc) plus a new test file pinning down the failure mode when calling a @qd.func taking a typed-dataclass arg from inside a @qd.data_oriented method that passes self.dataclass_member. The baseline (typed-dataclass kernel arg + @qd.func) passes. The four data_oriented variants all fail.

… data_oriented self @qd.func helpers with typed-dataclass parameters were unreachable from @qd.data_oriented kernel methods that wanted to pass self.dataclass_member: the caller-side AST expansion in _expand_Call_dataclass_args / _expand_Call_dataclass_kwargs only fired for dataclass *types* attached to bare ast.Name nodes (typed kernel args), not for dataclass *instances* attached to ast.Attribute nodes (self.X access). Extend both expansion paths to recognise the instance-of-dataclass case and emit per-leaf ast.Attribute children. The positional path additionally threads the callee parameter name and callee_needed set through, so callee-side pruning of unused dataclass fields stays consistent with caller-side emission. Tests in tests/python/test_data_oriented_qd_func_dataclass.py: - baseline typed-arg + qd.func call (passes today) - data_oriented method + qd.func with positional dataclass member - ... with keyword dataclass member - ... with stable_members=True - ... with two dataclass members (Genesis-shaped) All 5 pass. Design doc: perso_hugh/doc/data_oriented_qd_func_dataclass.md (Option A chosen).

… self Adds 4 tests: - nested dataclass (Outer{Inner{ndarray}}) passed via self.outer, positional - nested dataclass passed via self.outer, kwarg (stable_members=True) - two-step @qd.func chain (outer_write -> inner_write) with self.state - combined: nested dataclass threaded through a 2-step @qd.func chain All pass. The outermost data_oriented call site uses the new instance-of-dataclass branch (with recursion threading callee_param); inner qd.func -> qd.func calls use the original typed-arg expansion path unchanged.

…achinery When a @qd.data_oriented `self` is passed as a `qd.template()` kernel arg, `_predeclare_struct_ndarrays` walks the entire object graph and registers every reachable ndarray as a kernel parameter. For real-world classes (e.g. Genesis's RigidSolver) that's hundreds of ndarrays per kernel, even when the kernel only touches a few — every extra arg slows down each launch's launch-context population. Hook into the same 2-pass compile machinery that prunes typed-dataclass arg flat-names: - Pass 0 (non-enforcing): `_predeclare_struct_ndarrays` registers every reachable ndarray as today. `_promote_ndarray_if_declared` now records `id(ndarray)` in `pruning.used_struct_ndarray_ids` whenever an attribute chain like `self.x.y` resolves to one of these pre-declared ndarrays — both for direct accesses in the kernel body and for accesses inside inlined `@qd.func` bodies. - Pass 1 (enforcing): `_predeclare_struct_ndarrays` only registers ndarrays whose id was observed in pass 0. Unused ndarrays are dropped from the kernel's parameter list and from `struct_ndarray_launch_info`, so neither compile nor each launch pays for them. On a Genesis non-batched single-Franka CPU rigid step with `RigidSolver` migrated to `@qd.data_oriented(stable_members=True)`: - step_1 ndarray-args: 326 -> 217 (-109) - step_2 ndarray-args: 326 -> 145 (-181) - steady-state step time: 493 us -> 403 us (FPS 2030 -> 2482) Fastcache hit (pass-0 skipped) is gated via `pruning.pass_0_ran`: the set is unreliable in that case so we fall back to registering every reachable ndarray, matching historical behavior.

… in data_oriented walk Mitigation 1 (perf branch) stashes a per-instance BoundQuadrantsCallable in instance.__dict__ on first instance.method access so subsequent lookups skip __get__ allocation. The fastcache args-hasher's @qd.data_oriented walk iterates over obj.__dict__ and previously fell through to the [FASTCACHE][PARAM_INVALID] warning when it encountered that cached entry, disabling fastcache for the whole call (reproduced by test_fastcache_kernel_parameter). These descriptor-cache entries are not data; skip them in the walk so the fastcache key only reflects real members.

Mitigation 5's first cut over-conservatively marked every ndarray reachable from a wholesale-passed dataclass: Option A in call_transformer expands func(self.dc) to per-leaf children func(self.dc.x, self.dc.y, ...), build_stmt runs on each, and _promote_ndarray_if_declared was marking the id as used regardless of whether the callee actually touches it. This left ~205 unused ndarray args still registered per step in the Genesis rigid_solver migration. Two coordinated fixes: 1. Mirror build_Name's expanding_dataclass_call_parameters gate in _promote_ndarray_if_declared. The leaf accesses synthesized by Option A don't represent the kernel body actually touching the ndarray — only the callee body's own accesses (which run with the flag = False) should count. 2. Tag each pre-declared ndarray's AnyArray proxy with _qd_source_ndarray_id. After Option A's expansion, the callee's typed-arg flat-name locals are bound to already-promoted AnyArrays, so when the inlined callee body accesses them, the value reaching _promote_ndarray_if_declared isn't an Ndarray anymore. Tagging lets us mark via the AnyArray too. On Genesis non-batched single-Franka CPU with rigid_solver migrated to @qd.data_oriented(stable_members=True): - step_1 ndarray-args: 217 -> 120 (matches baseline exactly) - step_2 ndarray-args: 145 -> 37 (matches baseline exactly) - total ndarray-args/step: 644 -> 439 (matches baseline exactly) - steady-state step time: 403 us -> 337 us (vs baseline 338-345 us) The migration is now performance-neutral (was -33% FPS, then -22%, now ~0%). 1173 tests pass; the same 8 quadrants-main pre-existing failures remain (4x test_ad_global_data_access_rule_checker, etc.).

…-class The args_hash data_oriented walker added in a0db648 ([Fix] args_hash invalidates when data_oriented ndarray member is reassigned) ran unconditionally for every arg of every kernel call. Even after 93893e5 cached the per-class attribute paths, the per-call ``is_data_oriented(arg)`` + ``type(arg).__dict__.get`` chain still cost ~15% FPS on small-step CPU benches (anymal_zero CPU bs=0: 7231 -> 5955 FPS = -17.6% vs the pre-branch reference). Two coordinated optimisations: 1. Only iterate ``self.template_slot_locations`` instead of all args. Typed-dataclass args carry a specific dataclass type by construction and a data_oriented class is never a dataclass, so the only positions where a data_oriented container can appear are the ``qd.template()`` annotated ones — already tracked by the kernel decorator. Genesis main ``kernel_step_1`` has 4 template positions of 16 args; reduces the per-call work proportionally. 2. Per-``type(arg)`` precomputed dispatch: ``_arg_nd_paths_or_none`` maps each seen type to either the cached path list to walk, or ``None`` (skip — covers primitive templates, non-data_oriented composites, ``_qd_stable_members`` data_oriented, and data_oriented with zero ndarrays). One ``dict.get`` per candidate per call after warmup, replacing the previous ``is_data_oriented`` + ``__dict__.get`` + ``_struct_nd_paths_for`` chain. Measured on cluster ``rtx-mid`` single process, ``test_speed[anymal_zero-None-None- 0-cpu]``, 3-run median, Genesis main + Quadrants branch: - pre-fix tip (02e5660): 5955 FPS (-17.6% vs a22cc2d reference 7231) - after this commit: 6935 FPS (-4.1% vs reference) Recovery: +16.5% FPS on Genesis main; +11.2% on Genesis ``hp/data-oriented-rigid- solver`` (6315 -> 7020). Brings CPU bs=0 within ~3-4% of the pre-branch baseline. Other Quadrants tests (test_data_oriented_ndarray, test_data_oriented_qd_func_dataclass, test_callable_template_mapper, test_kernel_templates, test_template_typing) still pass.

…_oriented Genesis unit tests on cluster hit RecursionError (118 instances across test_rigid_physics, test_fem, test_hybrid, test_render, ...). Two independent root causes, both in the recursive ndarray-graph walkers used to discover ndarray members of ``@qd.data_oriented`` / ``dataclass`` kernel args: 1. ``is_data_oriented(obj)`` did ``getattr(type(obj), "_data_oriented", False)``. For Genesis containers like ``RigidOptions`` (a ``pydantic.BaseModel`` subclass), the metaclass ``ModelMetaclass.__getattr__`` recurses infinitely on missing class attribute names, blowing the stack on every call. Fix: walk MRO and look up ``_data_oriented`` directly in each class's ``__dict__`` — never goes through ``getattr`` / ``__getattr__`` so it's immune to pathological metaclasses. ``@qd.data_oriented`` sets the flag directly on the decorated class so the MRO walk still finds it. 2. ``_build_struct_nd_paths`` (in ``_template_mapper_hotpath.py``) and ``_walk_obj`` (in ``function_def_transformer.py``) had no cycle detection. Genesis object graphs have cross-references (e.g. ``solver <-> scene <-> sim <-> solver``) so the walkers recurse forever on real workloads. Fix: track ``id(obj)`` in a per-traversal ``seen`` set and skip re-entering a node we've already expanded. Adds ``test_is_data_oriented_safe_on_pydantic_like_metaclass``, ``test_data_oriented_with_pydantic_like_child``, and ``test_data_oriented_with_cyclic_attr_graph`` to pin both fixes.

…nditionals

…ache-stale leaves Two related robustness fixes surfaced by the Genesis ``hp/data-oriented- rigid-solver`` migration on cluster unit tests. ## Problem 1: ``_uid: UID`` disables fastcache on stable_members classes After Genesis migrated ``kernel_step_1`` / ``kernel_step_2`` to methods on ``@qd.data_oriented(stable_members=True) class RigidSolver``, the fastcache args-hasher walks ``RigidSolver.__dict__``, encounters ``_uid`` of type ``genesis.utils.uid.UID``, can't recognise it, and disables fastcache for the whole call: [FASTCACHE][PARAM_INVALID] Parameter with path ('0', '_uid') and type <class 'genesis.utils.uid.UID'> not allowed by fast cache. [FASTCACHE][INVALID_FUNC] The pure function step_1 could not be fast cached, because one or more parameter types were invalid Causes 5 ``test_quadrants.py`` failures (``test_num_envs``, ``test_ndarray_no_compile`` on both backends) that all assert fastcache fires for ndarray-backend ``RigidSolver`` invocations. ``stable_members=True`` already promises the class's member set / types don't change after construction. Under that contract, opaque metadata (``UID``, etc.) is inert from fastcache's perspective: it doesn't affect kernel codegen. Treat ``stable_members=True`` containers as tolerant — skip unrecognised members silently and continue, instead of returning None and killing fastcache. Also silence the per-member ``[FASTCACHE][PARAM_INVALID]`` log inside a stable_members walk via a depth counter, so the user doesn't see warnings for members they explicitly opted out of caring about. ## Problem 2: cached ndarray-path leaves can be stale across instances ``_struct_nd_paths_cache`` is keyed on ``type(arg)`` and assumes the set of ndarray-reachable attribute chains is stable across instances. That's the common case but breaks on polymorphic Genesis solvers: ``FEMSolver`` / ``MPMSolver`` / ``SPHSolver`` can hold a ``qd.Tensor`` whose underlying impl swaps between an ``Ndarray`` and a ``MatrixField`` between instances. ``_collect_struct_nd_descriptors`` then walks a cached path to a ``MatrixField`` and crashes with:: AttributeError: 'MatrixField' object has no attribute 'element_type' Fix: defensively check ``isinstance(v, Ndarray)`` after the tensor-wrapper unwrap and skip stale entries silently. ``element_type`` / ``shape`` / ``_qd_layout`` are Ndarray-only; non-Ndarray leaves can't contribute a meaningful descriptor anyway, and the per-instance ``weakref(arg)`` part of the spec key still ensures cache discrimination. Adds ``test_data_oriented_polymorphic_attr_across_instances`` to pin the cache-stale-leaf behaviour.

…ail on Field/MatrixField My previous commit ``5add57b6a`` was too loose: it silently skipped *any* member that ``stringify_obj_type`` returned ``None`` for, including ``Field`` / ``MatrixField``. That broke ``test_quadrants.test_num_envs[ False-*]`` (field backend), which pins the contract that fastcache must fail when an arg's subtree contains a recognised-but-unsupported tensor-like type (whose value affects kernel codegen). Differentiate two reasons ``stringify_obj_type`` returns ``None``: (a) RECOGNISED-BUT-UNSUPPORTED: ``ScalarField`` / ``MatrixField`` (and any future type explicitly hitting ``_mark_warn_if_not_tensor_ annotation``). These now also call ``_mark_hit_recognised_ unsupported()`` to flip a module-level flag. The flag bubbles up naturally through nested dataclass / data_oriented walkers since they propagate ``None``. (b) TRULY-OPAQUE: unknown types falling through to the ``[FASTCACHE][PARAM_INVALID]`` branch (``RigidSolver._uid: UID``, etc.). These don't set the flag. The ``stable_members=True`` data_oriented walker snapshots the flag around each child's recursive call. If a child returned ``None`` AND the flag was set, fastcache fails (any tensor-like leaf in the subtree invalidates the hash). If the flag was clear, the child is truly opaque metadata — skip it silently under the user's stability contract. ``_hit_recognised_unsupported`` is reset at the top of ``hash_args`` and before each child probe so the snapshot reflects only the just-completed recursion.

`dataclasses.is_dataclass(obj)` calls `hasattr(type(obj), '__dataclass_fields__')`, which delegates to the metaclass `__getattr__` for missing names. Pydantic's `ModelMetaclass` (and our `RecursingMeta` regression fixture) recurse infinitely on arbitrary lookups and blow the stack — same class of failure as the previously-fixed `is_data_oriented(obj)` path. Add `is_dataclass_instance` in `lang/util.py` that walks `type(obj).__mro__` probing `klass.__dict__` directly (never via `getattr`), and use it everywhere the kernel pipeline tests user values for dataclass-ness: - `_template_mapper_hotpath._build_struct_nd_paths` - `function_def_transformer._walk_obj` (both branches) - `function_def_transformer` dataclass-vs-`__dict__` walker dispatch - `args_hasher.stringify_obj_type` Annotations/types are untouched (`call_transformer`, `_signature`, `_kernel_impl_dataclass`): those check user-declared dataclass types, not runtime values that can carry pathological metaclasses. Fixes `test_data_oriented_with_pydantic_like_child` (added in b3457a6 to pin this exact regression but caught only the `is_data_oriented` half of it).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55ecf95d55

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T08:51:41Z



-def data_oriented(cls):
+def data_oriented(cls=None, *, stable_members: bool = False):


Document the new data_oriented stable_members API

This changes the public API by allowing @qd.data_oriented(stable_members=True) and by documenting _qd_stable_members as an equivalent class-level knob in the runtime docstring, but the commit has no corresponding docs/ update. The repository's AGENTS.md requires user-facing docs to stay in sync with public API or usage changes; this is especially important here because the existing docs still describe ndarray-member reassignment for @qd.data_oriented containers as supported, while this option makes reassignment undefined behavior.

Useful? React with 👍 / 👎.

Black/ruff reformatted multi-import statements onto multiple lines.

github-actions · 2026-05-18T09:25:19Z

Total: 13 file(s) changed, +531 -54 code lines.

The previous design used a ``stable_members=True`` opt-in flag (or per-class ``_qd_stable_members`` attribute) to tell the fastcache hasher to silently skip opaque-typed members of ``@qd.data_oriented`` containers. Without the opt-in, any unrecognised member type disabled fastcache for the whole call, which made adding a UUID, Pydantic config object, or back-pointer to ``self`` silently kill fastcache. That contract was brittle: adding any new metadata member to a long-lived ``@qd.data_oriented`` class could disable fastcache without warning, and the opt-in was an "I promise the layout doesn't change" contract that has nothing to do with the actual fastcache invariant. The actual invariant: opaque Python types cannot affect kernel codegen because the kernel cannot read them. Only recognised types — ndarrays, primitives, enums, dataclasses, nested ``@qd.data_oriented`` objects — can be read by kernel code. So *all* container walkers (``dataclass_to_repr`` and the ``data_oriented`` branch in ``stringify_obj_type``) can safely skip opaque members from the hash, no opt-in needed. Recognised-but-unsupported types (``qd.field`` / ``qd.Matrix.field``) are distinct: their shape/dtype affect kernel codegen but fastcache doesn't yet know how to hash them. These still disable fastcache for the whole call — behaviour is unchanged. Top-level kernel-arg opaqueness is also distinct: an opaque top-level arg is a user error (the kernel's argument is uninterpretable to fastcache) and still emits the ``[FASTCACHE][PARAM_INVALID]`` warning. Implementation: ``stringify_obj_type`` now takes a ``nested: bool`` parameter that suppresses the warning for nested opaque types. Container walkers pass ``nested=True``. Removed the global ``_skip_unknown_warn_depth`` counter and ``_hit_recognised_unsupported`` flag — replaced with a clean ``_FAIL_FASTCACHE`` sentinel distinct from ``None`` (opaque/silent-skip). The ``@qd.data_oriented(stable_members=True)`` flag and ``_qd_stable_members`` attribute remain — they still gate the launch-context per-call walker optimization in ``_template_mapper`` and ``Kernel.launch_kernel``. Removed from the fastcache hasher's logic only. Added 3 regression tests pinning the new defaults: - data_oriented with opaque member: silently hashable. - data_oriented with nested field: still FastcacheSkip. - dataclass with opaque field: silently hashable. All 130 fastcache + data_oriented tests pass on x64.

…le_members scope Docs the new behaviour committed in 49ffb3b: - ``compound_types.md`` ``### Fastcache``: explain the three-bucket type-based classification (recognised+valid / opaque / recognised+unsupported) that applies to every ``@qd.data_oriented`` argument by default. Add a separate ``### stable_members=True`` subsection clarifying that the flag is a per-call launch performance hint (template-mapper + launch-context cache), not a fastcache contract. - ``fastcache.md`` compound-type rules: add the opaque-skip bullet and the opaque-vs-recognised-unsupported note. - ``kernel_impl.data_oriented`` docstring: narrow ``stable_members`` to its actual scope (per-call walker skip) and explicitly note that fastcache silences opaque members regardless of the flag.

…ify stable_members scope" This reverts commit 7757907.

This reverts commit 49ffb3b.

…llback Unrecognised types in fastcache argument hashing previously had two failure modes, both bad: - Top-level: ``[FASTCACHE][PARAM_INVALID]`` warn + return None, disabling fastcache for the whole call. Any solver-like object carrying a single opaque metadata field (Genesis ``UID``, Pydantic config, back-pointer) silently killed the cache. - Nested under ``@qd.data_oriented(stable_members=True)``: silent skip. Worked for the Genesis case but is dangerous: if someone later adds a new tensor-like type (e.g. ``BFloat16Tensor``) whose value affects kernel codegen but forgets to register it in args_hasher's recognised set, the silent skip serves stale cache results without any indication. Both paths are replaced with a single ``type(v).__qualname__``-based fallback (``opaque-<module>.<qualname>``) that emits a one-shot ``[FASTCACHE][UNKNOWN_TYPE]`` warning per type. Properties: - Cache key stable across instances of the same opaque class (Genesis UID #1 and UID #2 produce the same key). Kernels cannot read non-recognised Python types so opaque metadata cannot affect codegen, making type-identity-only hashing correct for genuinely opaque members. - Loud diagnostic for the dangerous case: any unrecognised type that ever gets hashed prints a warning pointing at args_hasher.stringify_obj_type so a missed tensor-like registration is impossible to miss. - ``ScalarField`` / ``MatrixField`` (recognised-but-unsupported tensor-like) still disable fastcache via a new ``_FAIL_FASTCACHE`` sentinel — their shape/dtype affect codegen but fastcache doesn't yet handle them. Distinct from the qualname fallback so the field path remains correct. Also adds ``pruning_paths`` and ``parent_flat`` plumbing through ``stringify_obj_type`` / ``dataclass_to_repr`` / ``hash_args`` for the upcoming pruning-driven narrow walk (L1 cache lookup of kernel-accessed flat names); the new parameters default to None so this commit alone is the qualname-fallback baseline. ``test_src_ll_cache_arg_warnings`` updated to assert the new ``[UNKNOWN_TYPE]`` warning (instead of the old ``[PARAM_INVALID]`` + ``[INVALID_FUNC]`` dead-end). The ``_qd_stable_members`` flag is no longer read by args_hasher; its launch-context role (``_mutable_nd_cached_val`` short-circuit) is unchanged in this commit and will be addressed separately.

github-actions · 2026-05-18T10:17:47Z

Total: 14 file(s) changed, +580 -62 code lines.

Replaces the pre-refactor single-level cache (one key derived from source + config + a *wide* args walk) with a two-level pruning-driven scheme: - L1 key (``src_hasher.make_source_config_key``): source + config + version, no args dependence. Stores ``PruningInfo`` — the set of kernel-accessed flat names produced during compile (``Pruning``'s ``used_vars_by_func_id[KERNEL_FUNC_ID]``, folded with data_oriented ndarray attribute chains from ``struct_ndarray_launch_info``). Also persists ``graph_do_while_arg`` (source-deterministic). - L2 key (``src_hasher.make_full_cache_key``): L1 key + ``narrow_args_hash``. The narrow hash walks only paths in the L1 pruning set, so unrelated metadata changes on the same kernel-accessed surface no longer invalidate the cache. Lookup flow (warm call): L1 lookup → narrow args walk using L1 pruning info → L2 lookup → load artifact. Cold compile: L1 miss → full compile (pass 0 + pass 1) → store L1 → compute narrow args hash → store L2. Crucially, "L1 hit but L2 miss" still triggers full pass 0+1 (not just pass 1): pass 0 is what populates per-callee-func pruning info, and L1 only stores the kernel-level set, so skipping pass 0 is only safe when the C++ artifact is already loaded (``only_parse_function_def=True``). Pruning narrowing rules in ``args_hasher.stringify_obj_type``: - Dataclass children: flat-name pruning is *complete* (every dataclass field is flattened by ``FlattenAttributeNameTransformer``), so narrow walking by ``child_flat in pruning_paths`` is safe. - Data_oriented children: pruning is only complete for ndarray members (via ``struct_ndarray_launch_info``). Primitive members (template-position values baked into the kernel) are NOT tracked by pruning. To stay correct, the data_oriented branch only narrows *ndarray* children; non-ndarray children are always walked (the recursive call still narrows nested dataclasses). This is why ``test_template_raise_on_data_oriented_floats`` and the dtype-distinct cache-key test both still pass: primitives keep contributing to the hash, only kernel-unused ndarrays get pruned. Behavior change: ``test_src_ll_cache_arg_warnings`` and ``test_fastcache_field_warnings_warn_struct_template_field`` updated to reflect that fastcache no longer fires ``[PARAM_INVALID]`` or ``[INVALID_FUNC]`` for unrecognised types at the *top level* (qualname fallback from previous commit handles them) or for Field-bearing *unused* dataclass members (narrowing skips them). Tests now exercise the genuinely-live cases. ``test_src_hasher_*`` updated to use the new ``make_source_config_key`` / ``make_full_cache_key`` API. stable_members is no longer read by the args hasher (handled by previous commit); its launch-context role in ``Kernel.launch_kernel`` still uses the legacy flag and will be addressed in a follow-up.

github-actions · 2026-05-18T10:36:32Z

Diff coverage: 98% · 512 lines, 12 missing

…erf-only After the previous two commits, fastcache is no longer brittle wrt opaque members: the cache key is derived from kernel pruning info, and unrecognised types at kernel-read paths fall back to a deterministic type(v).__qualname__ hash with a one-shot [UNKNOWN_TYPE] warning. This commit aligns the user-visible docs (fastcache.md, compound_types.md) and the data_oriented(stable_members=...) docstring with that semantic. stable_members is documented as *purely* a launch-time perf hint with no fastcache role; the launch-context comments in kernel.py and _template_mapper.py are updated to call this out explicitly. Also fixes a pylint no-else-return warning introduced by the refactor.

github-actions · 2026-05-18T11:16:29Z

Total: 17 file(s) changed, +747 -116 code lines.

…name fallback Three rules now strictly enforced by the args hasher: 1. The cache key may only include contributions from kernel-pruned paths. Never a qualname-based hash for unrecognised types — that captures type identity without type parameters (dtype/shape) and would silently mask value-affecting changes. 2. Unrecognised types at kernel-read paths must not be silently dropped. Fastcache is disabled loudly with a one-shot [UNKNOWN_TYPE] warning plus [INVALID_FUNC] log line. 3. Fastcache works for data_oriented containers — pruning info now covers every attribute chain rooted at a kernel arg, not just ndarrays. Compiler-side: ASTTransformer.build_Name annotates non-flattened kernel-arg Names with ``_qd_arg_chain``; build_Attribute propagates the annotation through ``self.dofs.x`` chains and records them via the new ``Pruning.mark_kernel_arg_chain_used`` (separate set so they don't poison ``struct_locals`` and break codegen). ``Pruning.record_after_call`` was extended to propagate chain-path entries across @qd.func calls including Attribute args (``f(self.dofs)``). After both compile passes, ``Kernel._fold_kernel_arg_chain_paths_into_pruning`` merges the kernel's chain-paths into ``used_vars_by_func_id[KERNEL_FUNC_ID]`` (same set as ``used_py_dataclass_parameters_by_key_enforcing[key]`` by reference) so the fastcache args-hash narrow walk picks them up. Args-hasher side: removed the data_oriented ndarray-only carveout — ``_is_path_used(pruning_paths, child_flat)`` now applies to every member. Removed ``_qualname_fallback``; replaced with ``_fail_unknown_type`` which returns _FAIL_FASTCACHE and emits the [UNKNOWN_TYPE] warning. Tests + docs updated to match. Full x64 suite: 4063 passed.

github-actions · 2026-05-18T12:13:31Z

Diff coverage: 96% · 599 lines, 25 missing

Five new tests in test_data_oriented_ndarray.py covering the three rules the args hasher now enforces (see fastcache.md "Pruning-driven argument hashing"): - test_data_oriented_kernel_unused_opaque_member_does_not_affect_cache: rule 1 — two State instances differing only in an uuid member that the kernel never reads share the same compiled artifact across processes. - test_data_oriented_kernel_read_opaque_member_fails_fastcache: rule 2 — when the kernel actually reads an unrecognised-type member, fastcache fails loudly with [UNKNOWN_TYPE]+[INVALID_FUNC]. Kernel still runs. - test_data_oriented_kernel_read_primitive_distinguishes_cache_key: rule 3 — kernel reading a primitive member (s.n baked in) cold-compiles per value and both values load distinct artifacts on warm start. - test_data_oriented_kernel_unread_primitive_does_not_affect_cache: rule 1 mirror for primitives — unused_n differences don't perturb the cache key. - test_data_oriented_qd_func_chain_propagation_distinguishes_cache_key: Pruning.record_after_call propagation through @qd.func(s.dofs) — the inner dofs.x dtype must reach the kernel's pruning set so changes invalidate the cache.

…e-mode note The earlier docstring mentioned a qualname fallback for unrecognised types, which was true at the time but was subsequently removed in the strict-rules refactor. Update the note to match the actual current behaviour: unrecognised types at kernel-read paths fail fastcache loudly with [UNKNOWN_TYPE] + [INVALID_FUNC].

The TemplateMapper's args_hash walk used a per-class cache of attribute paths populated from the first instance ever seen of each class. That cache was wrong for @qd.data_oriented classes whose attribute structure varies across instances (motivating case: Genesis ``DataManager``, which only allocates ``*_adjoint_cache`` members when ``requires_grad=True``). Two failure modes existed: - Forward direction (first instance has the attr, second misses it): the walk crashed with ``AttributeError: 'DataManager' object has no attribute 'dofs_state_adjoint_cache'`` when launching kernels on the second instance. Observed on Genesis ``test_rigid_mpm_legacy_coupling`` (macos-15 GPU job in PR genesis-world#2799). - Inverse direction (first instance lacks the attr, second has it): silently miscached — the new ndarray's id never made it into args_hash, so a later reassignment of that attribute wouldn't trigger spec re-derivation. Fix: stash the walked path list on the *instance* (``arg._qd_nd_paths``) via ``object.__setattr__`` (compatible with frozen dataclasses, mirroring the existing ``_qd_dc_repr`` pattern in ``args_hasher.dataclass_to_repr``). Each instance is walked once on first kernel call; subsequent calls fetch the cached list via instance ``__dict__`` lookup (~30 ns, same order as the previous class-level ``dict.get``). Steady-state perf: unchanged on franka cpu single env (one solver instance, walked once at scene build, fetched per-call thereafter). Startup pays one walk per instance lifetime — ~10us per scene build for Genesis-shaped workloads. ``__slots__`` classes that can't accept the instance stash fall back to per-class caching and retain the legacy polymorphic-instance limitation; Genesis data_oriented containers don't use ``__slots__``. ``_classify_for_args_hash`` is split into a per-class disposition (``_SKIP`` / ``_PER_INSTANCE``) plus a per-instance ``_struct_nd_paths_for`` call. The ``_qd_stable_members`` flag still short-circuits the entire walk for users who opt into the "no ndarray reassignment, ever" promise. Test ``test_data_oriented_polymorphic_attribute_set_across_instances`` covers both forward and inverse directions on a ``DataManager``-shaped class.

github-actions · 2026-05-19T13:42:37Z

Total: 21 file(s) changed, +1088 -124 code lines.

- ``test_data_oriented_polymorphic_attribute_set_across_instances``: the inverse-direction case now uses a kernel that *reads* ``s.extra`` (the conditional attribute) — without the per-instance walk this would silently miss ``('extra',)`` from the kernel-used path list. Adds a reassignment step that verifies same-shape ndarray swaps go through the per-call ``id(v)`` folding cleanly. - ``test_src_ll_cache_hit_predeclare_struct_ndarrays_pruned``: pins ``710ee4705``. A data_oriented arg with three ndarrays (``a``/``b``/``c``) but a kernel that only writes ``b``. Cold compile populates the fastcache with the flat-name pruning set; ``qd.reset()`` + ``qd.init()`` reloads it; cache-hit branch in ``_predeclare_struct_ndarrays`` must reproduce the same single-ndarray registration set, otherwise insertion-order registration would scramble slots and the write would land in ``state.a`` instead of ``state.b``.

github-actions · 2026-05-19T14:30:14Z

Total: 21 file(s) changed, +1131 -124 code lines.

github-actions · 2026-05-19T16:06:56Z

Diff coverage: 97% · 976 lines, 29 missing

…hash Pins the L2 collision between needs_grad=False (cold) and needs_grad=True (hot) scenes that differ only on the .grad-present flag. ``args_hasher.stringify_obj_type`` stringifies ndarray leaves by (dtype, ndim) only, so the narrow args_hash is the same and the second scene loads the without-grad artifact — the kernel's compiled parameter slot has needs_grad=False baked in but the launch routes the with-grad ndarray through the _QD_ARRAY_WITH_GRAD bucket, mis-aligning the parameter struct (silent wrong results or runtime OOB depending on slot layout). Test FAILS on this commit (asserts cache_loaded is False after the with-grad launch; observed True with the unfixed args_hasher). Fix to follow in next commit.

``ScalarNdarray``/``VectorNdarray``/``MatrixNdarray`` instances now stringify with an extra ``-g`` tag when their grad buffer is present. needs_grad is part of the compiled parameter-struct layout (``insert_ndarray_param`` bakes the grad pointer into the slot iff needs_grad=True), and the launch path picks between ``_QD_ARRAY`` and ``_QD_ARRAY_WITH_GRAD`` buckets off ``v.grad is not None`` — so two scenes that differ only by needs_grad MUST hash distinctly, otherwise L2 returns an artifact whose slots are mismatched at launch (silent miscomputation or runtime OOB depending on slot offset alignment). This is the root cause of the Genesis ``test_diff_*`` autodiff failures: the non-grad ``kernel_init_link_fields`` artifact landed in L2 first; the ``requires_grad=True`` run loaded that artifact and routed ``links_state.quat`` through ``_QD_ARRAY_WITH_GRAD`` against a slot declared without grad, producing the "Out of bound access to ndarray at arg 44 with indices [0,0,0]" assertion. Reproducer test was added in the previous commit; it now passes on x64, vulkan and cuda. Full fast_caching + test_data_oriented_ndarray + test_ad_dataclass suite: 257 passed, 6 skipped.

github-actions · 2026-05-19T16:48:38Z

Total: 21 file(s) changed, +1172 -127 code lines.

github-actions · 2026-05-19T18:21:43Z

Diff coverage: 97% · 1016 lines, 28 missing

github-actions · 2026-05-19T19:10:34Z

Total: 21 file(s) changed, +1172 -127 code lines.

github-actions · 2026-05-19T20:40:51Z

Diff coverage: 97% · 1016 lines, 28 missing

hugh and others added 12 commits May 18, 2026 01:46

[Style] Apply pre-commit (black + ruff): import order, single-line co…

cc1e380

…nditionals

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

[Style] pre-commit: import formatting

6d9c307

Black/ruff reformatted multi-import statements onto multiple lines.

hughperkins added 2 commits May 18, 2026 02:33

hughperkins changed the title ~~[DataOriented] Fastcache + walker fixes from real-world Genesis migration~~ [DataOriented] Fastcache: opaque-member silencing by default + walker fixes May 18, 2026

hughperkins added 3 commits May 18, 2026 02:50

Revert "[Doc] Fastcache: opaque-member silencing is the default; clar…

fb38fec

…ify stable_members scope" This reverts commit 7757907.

Revert "[Fix] Fastcache: skip opaque-typed members silently by default"

7cabaa0

This reverts commit 49ffb3b.

hughperkins added 2 commits May 18, 2026 05:28

hughperkins temporarily deployed to publish_pypi May 19, 2026 10:34 — with GitHub Actions Inactive

hughperkins added 2 commits May 19, 2026 09:06

[Lint] Reorder imports in needs_grad reproducer test

8a7ead4

hughperkins temporarily deployed to publish_pypi May 19, 2026 19:29 — with GitHub Actions Inactive

hughperkins temporarily deployed to publish_pypi May 19, 2026 20:05 — with GitHub Actions Inactive

hughperkins temporarily deployed to publish_pypi May 19, 2026 20:20 — with GitHub Actions Inactive

hughperkins deployed to publish_pypi May 19, 2026 20:20 — with GitHub Actions Active

hughperkins temporarily deployed to publish_pypi May 19, 2026 20:20 — with GitHub Actions Inactive



		def data_oriented(cls):
		def data_oriented(cls=None, *, stable_members: bool = False):

Conversation

hughperkins commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core refactor (latest)

Other fixes

Perf

Tests

Docs

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hughperkins commented May 18, 2026 •

edited

Loading