Add manhattanhenge fixture; fix bugs and 60x perf regression it exposed#57
Merged
Conversation
Adds an end-to-end fixture that computes the four annual Manhattanhenge sunset dates from first principles (Meeus low-precision solar position, Bennett refraction, bisection on apparent altitude). The program is the kind of compute an LLM might write in our sandbox: pure Python, math + datetime + zoneinfo, ~200 lines, no external libraries. Writing it as a real Python programmer would surfaced four issues in Pyex. Per AGENTS.md, fix Pyex rather than work around in the fixture. Correctness: - Parser/interpreter: support starred targets in for-loop unpacking (`for *rest, last in items: ...`). Lands a parser case in ControlFlow.parse_for, extends collect_for_vars with starred branches, and unpacks via a new unpack_for_list_starred helper that matches CPython's "at least N" error message. - datetime.fromtimestamp(ts, tz): accept tz both positionally and as a keyword. Was registered as :builtin (positional only); now :builtin_kw with explicit two-arg clauses for None and a tzinfo instance. Performance: - Helpers.update_closure_env/2 was rebuilding every module-level function's closure after each call, then rebind_var wrote the new function back into module scope. The new function's closure pointed at the post-call module map, which contained the *previous* rebound function, whose closure pointed at the previous module map, etc. Self-referential structure that grew one level per call. For a 3-deep call chain (a -> b -> c) invoked in a hot loop, repeated top-level invocations slowed quadratically: a 30-line repro went 65 -> 177 -> 398 -> 688 -> 1061 ms across 1..5 outer iterations. Manhattanhenge sweep ballooned from an expected ~16s to over two minutes. Fix: short-circuit update_closure_env when the captured scope is a single map (i.e. the function was defined at module level). There are no enclosing locals to refresh, the module is shared across callers via propagate_scopes, and rebinding the function back into module scope is redundant. After the fix the same sweep is linear at ~3.4 ms/iteration; full manhattanhenge main() goes from 120s+ to 1.9 s (60x). Test infrastructure: - Per-fixture options.json with a :timeout key, defaulting to 5_000 ms. ExUnit per-test timeout for fixture_test now scales as max(30s, 2 * fixture.timeout). Manhattanhenge sets timeout=30_000 to give generous headroom on slow CI. - AGENTS.md: codify the Elixir option-key convention (:timeout, not :timeout_ms). This rule existed in spirit; making it explicit so fixture options.json doesn't drift. All 5292 tests + 17 fixtures pass. mix format, warnings-as-errors, and dialyzer clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A real-world fixture exercising the
math + datetime + zoneinfointersection: compute the four annual Manhattanhenge sunset dates from
first principles (Meeus solar position, Bennett refraction, bisection
on apparent altitude). ~200 lines of pure Python, no external
libraries — exactly the kind of compute we expect LLMs to write inside
the sandbox.
The fixture's purpose is twofold:
non-trivial real program, recorded via
mix pyex.fixture record.would write it surfaces bugs that synthetic tests miss.
Per AGENTS.md: "Never work around Pyex limitations in fixture code.
Fixtures exist to expose bugs. If a fixture fails, fix Pyex." This PR
landed three Pyex fixes the fixture exposed, in addition to the
fixture itself.
What the fixture exposed
Correctness
Starred targets in for-loop unpacking (
for *rest, last in items).Standard Python since 3.0; the fixture didn't actually need it but a
natural rewrite of
henge_datesdid. Adds a parser branch inControlFlow.parse_for, extendscollect_for_varsto recognizestarred names, and unpacks via a new
unpack_for_list_starredhelperthat matches CPython's
"at least N"error wording.datetime.fromtimestamp(ts, tz). Was registered as:builtin(positional only), so passing
tz=...as a keyword raisedTypeError. Switched to:builtin_kwwith explicit two-arg clausesfor
Noneand atzinfoinstance.Performance — 60x regression
Helpers.update_closure_env/2was rebuilding every module-levelfunction's closure after each call, then
rebind_varwrote the newfunction back into module scope. The new function's closure pointed
at the post-call module map, which contained the previous rebound
function, whose closure pointed at the previous module map, and so on.
A self-referential structure that grew by one level per call.
For a 3-deep call chain (
a → b → c) invoked in a hot loop, repeatedtop-level invocations slowed quadratically. Minimal repro:
Repeating
for j in range(10): find(100)N times at the top level:Linear vs quadratic. Manhattanhenge full
main()(which callshenge_datessix times across the sensitivity sweep): 120s+ → 1.9s(60x). One isolated
henge_dates(2026)call: 2.7s → 0.35s (8x).Fix: short-circuit
update_closure_envwhen the captured scope isa single map — i.e. the function was defined at module level. There
are no enclosing locals to refresh, the module is shared across
callers via
propagate_scopes, and rebinding the function back intomodule scope is redundant. Closures defined inside other functions
(real lexical closures with mutable state, like
make_counter) stillgo through the full merge path.
Test infrastructure
options.jsonwith a:timeoutkey (default 5_000 ms).Read by
Pyex.Test.Fixture.load!/1and threaded intoPyex.run/2's:limits.fixture_test.exsExUnit timeout now scales asmax(30s, 2 * fixture.timeout), so slow fixtures don't have to share the default60s ceiling.
:timeout, not:timeout_ms). Existed in spirit; making it explicit so the newoptions.jsonschema doesn't drift later.Verification
mix test: 5292 tests, 0 failures, 2 skippedmix test test/pyex/fixture_test.exs: 17/17 in 1.9s (was timing outat 60s)
mix format --check-formatted,mix compile --warnings-as-errors,mix dialyzer: cleanNote on Manhattanhenge accuracy
The fixture matches AMNH's published 2026 dates within ±2 days.
Closing that gap requires terrain data (Palisades elevation profile)
and full IAU-2000 refraction; the goal here is a pure-Python compute
exercise, not an astronomy oracle. The recorded
expected.jsoncaptures CPython's output, which Pyex matches exactly.