fix(codegen): hash-seed determinism + gate verdict-teeth + gate-blocked release (Wave A)#11
Merged
Merged
Conversation
…ent output) Three collections in CodeGen.__init__ were iterated in Python set order (SipHash-randomized by PYTHONHASHSEED), and that order flowed into emitted C++ member declarations via orig_names -> func_var_originals -> _func_cs_var_remap (section 8c cloned var/series members): - all_func_scoped_series (set built by .update over func_series_vars values) - all_func_scoped_vars (set built over func_var_members values) - for fname in set(ctx.func_call_site_counts.keys()) The same identical input therefore transpiled to byte-different C++ across hash seeds. Make all three deterministic, ordered + de-duplicated: - all_func_scoped_series / all_func_scoped_vars -> ordered lists. - Iterate func_call_site_counts (a dict, insertion-ordered) directly. The deeper source: ctx.func_series_vars is dict[str, set] (the analyzer stores per-function series vars as sets), so iterating each *value* is also seed-randomized. Wrap those value iterations in sorted() at the two base.py consumption points. Other consumers use func_series_vars for membership or already sort, so no analyzer change is needed. _func_var_members_set stays a set (membership-only, never iterated into output). Proof: a 4-call-site function with 6 history-accessed local series vars now produces identical SHA-256 across PYTHONHASHSEED 0/1/7/12345/999999/31337/ 8675309 (7 distinct hashes before, 1 after). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gate was purely differential: it only checked that native CPython and wasm Pyodide produced identical results. Two identical crashes — or an ok/ fixture that erroneously errors on both sides — passed silently. The gate verified "they agree", never "they're right". Add checkExpectedVerdict (pure, in compare.mjs) enforcing the verdict implied by the corpus directory: tests/gate-corpus/ok/* must yield result.ok === true, tests/gate-corpus/err/* must yield result.ok === false. The wrong verdict, a malformed/unparseable payload, or an unexpected (non-CompileError) exception on EITHER side now fails the gate with a clear, per-side message — even when the two sides match. run-gate.mjs runs this alongside the existing differential compareResults and reports parity mismatches and verdict failures separately. selftest.mjs gains 10 verdict cases covering: ok/ and err/ happy paths, an ok/ that errors on both sides, an err/ that succeeds on both sides, shared unexpected exceptions on each branch, one-sided wrong verdicts, a missing side, and a malformed payload. Verified end-to-end: injecting a passing fixture into err/ makes gate:full exit 1 with "verdict ok=true but corpus dir expects ok=false"; gate:selftest and gate:full (275 fixtures) are otherwise green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ession test) Spawns 5 python3 children with distinct PYTHONHASHSEED (0/1/2/7/12345), each transpiling a fixture that drives the call-site-cloned member path (a 4-call-site function with its own series vars that also pulls series from a sub-function), and asserts all stdout is byte-identical. PYTHONHASHSEED is read once at interpreter startup, so the running pytest process cannot vary it -- hence subprocesses. PYTHONPATH is set to the repo root so the children import the in-tree pineforge_codegen. Locks fix(codegen): deterministic member-clone ordering. Verified: passes on the fixed tree (1 distinct output) and fails on pre-fix origin/main (2 distinct outputs across these seeds). Self-contained: pure transpile, no engine/network, runs in default CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…I→tag→npm) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deep-research P0/P1 fixes for codegen-oss:
codegen/base.pyiteratedstr-sets whose SipHash order reached emitted C++ member declarations → the browser worker emitted different C++ across boots and the gate's seed0-native↔random-wasm compare was one fixture from flaking. Fixed by ordered-list/sorted()(incl. the deeperfunc_series_varsset-valued case). Locked by a multi-seed subprocess regression test (tests/test_codegen_determinism.py) — confirmed it fails on pre-fix main.gate-corpus/ok/*→ok:true,err/*→ok:false; wrong verdict or unexpected exception on either side fails even when native↔wasm agree. Verified end-to-end.release.ymlnow runs the gate (selftest + full) BEFORE anything publishes, and publishes PyPI BEFORE pushing the tag (tag→npm). Gate-blocks both registries; no more ungated PyPI or PyPI/npm divergence.Verify
Determinism: identical SHA-256 across seeds [0,1,7,12345,999999] (2+ distinct pre-fix). Parity: 968 passed + golden 10 passed. Gate: PARITY OK over 275 with verdicts asserted; teeth proven (ok-fixture in err/ → fails). yaml ok.
Next: release 0.7.2 → app bumps (Wave B).
🤖 Generated with Claude Code