Skip to content

fix(codegen): hash-seed determinism + gate verdict-teeth + gate-blocked release (Wave A)#11

Merged
luisleo526 merged 4 commits into
mainfrom
feat/codegen-determinism
Jun 17, 2026
Merged

fix(codegen): hash-seed determinism + gate verdict-teeth + gate-blocked release (Wave A)#11
luisleo526 merged 4 commits into
mainfrom
feat/codegen-determinism

Conversation

@luisleo526

Copy link
Copy Markdown
Contributor

Summary

Deep-research P0/P1 fixes for codegen-oss:

  • Determinism (P0, proven): codegen/base.py iterated str-sets whose SipHash order reached emitted C++ member declarations → the browser worker emitted different C++ across boots and the gate's seed0-native↔random-wasm compare was one fixture from flaking. Fixed by ordered-list/sorted() (incl. the deeper func_series_vars set-valued case). Locked by a multi-seed subprocess regression test (tests/test_codegen_determinism.py) — confirmed it fails on pre-fix main.
  • Gate verdict-teeth (P1): the gate was purely differential (two identical crashes passed). Now asserts expected verdicts: gate-corpus/ok/*ok:true, err/*ok:false; wrong verdict or unexpected exception on either side fails even when native↔wasm agree. Verified end-to-end.
  • Release hardening (P1): release.yml now runs the gate (selftest + full) BEFORE anything publishes, and publishes PyPI BEFORE pushing the tag (tag→npm). Gate-blocks both registries; no more ungated PyPI or PyPI/npm divergence.

Verify

Determinism: identical SHA-256 across seeds [0,1,7,12345,999999] (2+ distinct pre-fix). Parity: 968 passed + golden 10 passed. Gate: PARITY OK over 275 with verdicts asserted; teeth proven (ok-fixture in err/ → fails). yaml ok.

Next: release 0.7.2 → app bumps (Wave B).

🤖 Generated with Claude Code

luisleo526 and others added 4 commits June 17, 2026 23:26
…ent output)

Three collections in CodeGen.__init__ were iterated in Python set order
(SipHash-randomized by PYTHONHASHSEED), and that order flowed into emitted
C++ member declarations via orig_names -> func_var_originals ->
_func_cs_var_remap (section 8c cloned var/series members):

  - all_func_scoped_series (set built by .update over func_series_vars values)
  - all_func_scoped_vars (set built over func_var_members values)
  - for fname in set(ctx.func_call_site_counts.keys())

The same identical input therefore transpiled to byte-different C++ across
hash seeds. Make all three deterministic, ordered + de-duplicated:

  - all_func_scoped_series / all_func_scoped_vars -> ordered lists.
  - Iterate func_call_site_counts (a dict, insertion-ordered) directly.

The deeper source: ctx.func_series_vars is dict[str, set] (the analyzer
stores per-function series vars as sets), so iterating each *value* is also
seed-randomized. Wrap those value iterations in sorted() at the two base.py
consumption points. Other consumers use func_series_vars for membership or
already sort, so no analyzer change is needed. _func_var_members_set stays a
set (membership-only, never iterated into output).

Proof: a 4-call-site function with 6 history-accessed local series vars now
produces identical SHA-256 across PYTHONHASHSEED 0/1/7/12345/999999/31337/
8675309 (7 distinct hashes before, 1 after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gate was purely differential: it only checked that native CPython and
wasm Pyodide produced identical results. Two identical crashes — or an ok/
fixture that erroneously errors on both sides — passed silently. The gate
verified "they agree", never "they're right".

Add checkExpectedVerdict (pure, in compare.mjs) enforcing the verdict implied
by the corpus directory: tests/gate-corpus/ok/* must yield result.ok === true,
tests/gate-corpus/err/* must yield result.ok === false. The wrong verdict, a
malformed/unparseable payload, or an unexpected (non-CompileError) exception on
EITHER side now fails the gate with a clear, per-side message — even when the
two sides match. run-gate.mjs runs this alongside the existing differential
compareResults and reports parity mismatches and verdict failures separately.

selftest.mjs gains 10 verdict cases covering: ok/ and err/ happy paths, an ok/
that errors on both sides, an err/ that succeeds on both sides, shared
unexpected exceptions on each branch, one-sided wrong verdicts, a missing
side, and a malformed payload.

Verified end-to-end: injecting a passing fixture into err/ makes gate:full
exit 1 with "verdict ok=true but corpus dir expects ok=false"; gate:selftest
and gate:full (275 fixtures) are otherwise green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ession test)

Spawns 5 python3 children with distinct PYTHONHASHSEED (0/1/2/7/12345),
each transpiling a fixture that drives the call-site-cloned member path
(a 4-call-site function with its own series vars that also pulls series
from a sub-function), and asserts all stdout is byte-identical.

PYTHONHASHSEED is read once at interpreter startup, so the running pytest
process cannot vary it -- hence subprocesses. PYTHONPATH is set to the repo
root so the children import the in-tree pineforge_codegen.

Locks fix(codegen): deterministic member-clone ordering. Verified: passes on
the fixed tree (1 distinct output) and fails on pre-fix origin/main (2 distinct
outputs across these seeds). Self-contained: pure transpile, no engine/network,
runs in default CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…I→tag→npm)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@luisleo526 luisleo526 merged commit e88c342 into main Jun 17, 2026
9 checks passed
@luisleo526 luisleo526 deleted the feat/codegen-determinism branch June 17, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant