Skip to content

Add manhattanhenge fixture; fix bugs and 60x perf regression it exposed#57

Merged
ivarvong merged 1 commit into
mainfrom
manhattanhenge-fixture
May 4, 2026
Merged

Add manhattanhenge fixture; fix bugs and 60x perf regression it exposed#57
ivarvong merged 1 commit into
mainfrom
manhattanhenge-fixture

Conversation

@ivarvong
Copy link
Copy Markdown
Owner

@ivarvong ivarvong commented May 3, 2026

Summary

A real-world fixture exercising the math + datetime + zoneinfo
intersection: compute the four annual Manhattanhenge sunset dates from
first principles (Meeus solar position, Bennett refraction, bisection
on apparent altitude). ~200 lines of pure Python, no external
libraries — exactly the kind of compute we expect LLMs to write inside
the sandbox.

The fixture's purpose is twofold:

  1. Conformance: prove Pyex matches CPython byte-for-byte on a
    non-trivial real program, recorded via mix pyex.fixture record.
  2. Pressure-test: writing a real Python program the way a human
    would write it surfaces bugs that synthetic tests miss.

Per AGENTS.md: "Never work around Pyex limitations in fixture code.
Fixtures exist to expose bugs. If a fixture fails, fix Pyex." This PR
landed three Pyex fixes the fixture exposed, in addition to the
fixture itself.

What the fixture exposed

Correctness

  • Starred targets in for-loop unpacking (for *rest, last in items).
    Standard Python since 3.0; the fixture didn't actually need it but a
    natural rewrite of henge_dates did. Adds a parser branch in
    ControlFlow.parse_for, extends collect_for_vars to recognize
    starred names, and unpacks via a new unpack_for_list_starred helper
    that matches CPython's "at least N" error wording.

  • datetime.fromtimestamp(ts, tz). Was registered as :builtin
    (positional only), so passing tz=... as a keyword raised
    TypeError. Switched to :builtin_kw with explicit two-arg clauses
    for None and a tzinfo instance.

Performance — 60x regression

Helpers.update_closure_env/2 was rebuilding every module-level
function's closure after each call, then rebind_var wrote the new
function back into module scope. The new function's closure pointed
at the post-call module map, which contained the previous rebound
function, whose closure pointed at the previous module map, and so on.
A self-referential structure that grew by one level per call.

For a 3-deep call chain (a → b → c) invoked in a hot loop, repeated
top-level invocations slowed quadratically. Minimal repro:

def a(x): return math.sin(x) + 1.0
def b(x): return a(x) * 2.0
def c(x): return b(x) - 0.5

def find(n):
    s = 0.0
    for i in range(n):
        s += c(i * 0.001)
    return s

Repeating for j in range(10): find(100) N times at the top level:

Top-level repeats Before After
1 65 ms 29 ms
2 177 ms 6 ms
3 398 ms 10 ms
4 688 ms 13 ms
5 1061 ms 17 ms
8 3306 ms 27 ms

Linear vs quadratic. Manhattanhenge full main() (which calls
henge_dates six times across the sensitivity sweep): 120s+ → 1.9s
(60x)
. One isolated henge_dates(2026) call: 2.7s → 0.35s (8x).

Fix: short-circuit update_closure_env when the captured scope is
a single map — i.e. the function was defined at module level. There
are no enclosing locals to refresh, the module is shared across
callers via propagate_scopes, and rebinding the function back into
module scope is redundant. Closures defined inside other functions
(real lexical closures with mutable state, like make_counter) still
go through the full merge path.

def update_closure_env(
      {:function, _name, _params, _body, %Env{scopes: [_single]}, _is_generator} = func,
      _post_call_env
    ) do
  func
end

Test infrastructure

  • Per-fixture options.json with a :timeout key (default 5_000 ms).
    Read by Pyex.Test.Fixture.load!/1 and threaded into
    Pyex.run/2's :limits.
  • fixture_test.exs ExUnit timeout now scales as max(30s, 2 * fixture.timeout), so slow fixtures don't have to share the default
    60s ceiling.
  • AGENTS.md: codify the Elixir option-key convention (:timeout, not
    :timeout_ms). Existed in spirit; making it explicit so the new
    options.json schema doesn't drift later.

Verification

  • mix test: 5292 tests, 0 failures, 2 skipped
  • mix test test/pyex/fixture_test.exs: 17/17 in 1.9s (was timing out
    at 60s)
  • mix format --check-formatted, mix compile --warnings-as-errors,
    mix dialyzer: clean

Note on Manhattanhenge accuracy

The fixture matches AMNH's published 2026 dates within ±2 days.
Closing that gap requires terrain data (Palisades elevation profile)
and full IAU-2000 refraction; the goal here is a pure-Python compute
exercise, not an astronomy oracle. The recorded expected.json
captures CPython's output, which Pyex matches exactly.

Adds an end-to-end fixture that computes the four annual Manhattanhenge
sunset dates from first principles (Meeus low-precision solar position,
Bennett refraction, bisection on apparent altitude).  The program is
the kind of compute an LLM might write in our sandbox: pure Python,
math + datetime + zoneinfo, ~200 lines, no external libraries.

Writing it as a real Python programmer would surfaced four issues in
Pyex.  Per AGENTS.md, fix Pyex rather than work around in the fixture.

Correctness:

- Parser/interpreter: support starred targets in for-loop unpacking
  (`for *rest, last in items: ...`).  Lands a parser case in
  ControlFlow.parse_for, extends collect_for_vars with starred
  branches, and unpacks via a new unpack_for_list_starred helper that
  matches CPython's "at least N" error message.
- datetime.fromtimestamp(ts, tz): accept tz both positionally and as a
  keyword.  Was registered as :builtin (positional only); now :builtin_kw
  with explicit two-arg clauses for None and a tzinfo instance.

Performance:

- Helpers.update_closure_env/2 was rebuilding every module-level
  function's closure after each call, then rebind_var wrote the new
  function back into module scope.  The new function's closure pointed
  at the post-call module map, which contained the *previous* rebound
  function, whose closure pointed at the previous module map, etc.
  Self-referential structure that grew one level per call.  For a
  3-deep call chain (a -> b -> c) invoked in a hot loop, repeated
  top-level invocations slowed quadratically: a 30-line repro went 65
  -> 177 -> 398 -> 688 -> 1061 ms across 1..5 outer iterations.
  Manhattanhenge sweep ballooned from an expected ~16s to over two
  minutes.

  Fix: short-circuit update_closure_env when the captured scope is a
  single map (i.e. the function was defined at module level).  There
  are no enclosing locals to refresh, the module is shared across
  callers via propagate_scopes, and rebinding the function back into
  module scope is redundant.  After the fix the same sweep is linear
  at ~3.4 ms/iteration; full manhattanhenge main() goes from 120s+ to
  1.9 s (60x).

Test infrastructure:

- Per-fixture options.json with a :timeout key, defaulting to 5_000 ms.
  ExUnit per-test timeout for fixture_test now scales as max(30s,
  2 * fixture.timeout).  Manhattanhenge sets timeout=30_000 to give
  generous headroom on slow CI.
- AGENTS.md: codify the Elixir option-key convention (:timeout, not
  :timeout_ms).  This rule existed in spirit; making it explicit so
  fixture options.json doesn't drift.

All 5292 tests + 17 fixtures pass.  mix format, warnings-as-errors,
and dialyzer clean.
@ivarvong ivarvong merged commit c0dca6f into main May 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant