Tests

feat(eval): defer runtime-unknown eval/new Function to a throw-on-reach error by default (#5206) #5838

Workflow file for this run

	name: Tests

	on:
	# Run on version tags so `release-packages.yml` can gate publish on a green
	# Tests workflow for the exact commit being released. Direct pushes to main
	# do NOT trigger tests — the gates that matter are PRs (pre-merge) and tags
	# (pre-release).
	push:
	tags: ['v*']
	pull_request:
	branches: [main]
	# `labeled` is here so the optional opt-in jobs (parity, compile-smoke,
	# doc-tests) re-fire when a maintainer or the PR author applies the
	# `run-extended-tests` label. Without it, applying the label on an
	# existing PR wouldn't re-trigger the workflow.
	types: [opened, synchronize, reopened, labeled]
	paths-ignore:
	- 'docs/src/**'
	- '*.md'
	- '!CLAUDE.md'
	- '!CHANGELOG.md'
	# Manual escape hatch for the opt-in jobs. Maintainers (write access)
	# can dispatch the workflow against any ref with `run_extended_tests=true`
	# to run parity / compile-smoke / package smokes / doc-tests on demand
	# without tagging a release.
	workflow_dispatch:
	inputs:
	run_extended_tests:
	description: 'Run extended tests (parity, compile-smoke, package smokes, doc-tests)'
	type: boolean
	default: false

	concurrency:
	group: test-${{ github.ref }}
	cancel-in-progress: true

	env:
	CARGO_TERM_COLOR: always
	MACOSX_DEPLOYMENT_TARGET: "13.0"

	jobs:
	# ---------------------------------------------------------------------------
	# Lint: cargo fmt --check (formatting gate for every PR)
	# ---------------------------------------------------------------------------
	lint:
	# Was macos-14 — moved to ubuntu-latest in v0.5.428 since `cargo fmt
	# --check` is portable. The 6 multiplier-min cut is small in absolute
	# terms (lint runs in ~30s) but it's free.
	runs-on: ubuntu-latest
	timeout-minutes: 20
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable
	with:
	components: rustfmt, clippy

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	# PRs read from cache; only main writes new entries.
	# Avoids cache thrash from short-lived branches.
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Check formatting
	run: cargo fmt --all -- --check

	# File-size gate (v0.5.1019): fails the PR if any tracked source
	# file exceeds the LOC threshold (5000 initially; eventual target
	# is 2000). Big single-file modules are hard to read, slow IDE +
	# cargo-check incrementality, and hide regressions in code review.
	# Allowlist + exclusions live in the script.
	- name: File size limit
	run: ./scripts/check_file_size.sh

	# GC write-barrier store-site inventory: every raw heap-slot store in
	# perry-codegen / perry-runtime / perry-stdlib must be barriered or
	# carry a justified GC_STORE_AUDIT(...) marker (or a justified entry
	# in scripts/gc_store_site_allowlist.txt). Catches new unbarriered
	# old->young store paths before they become nondeterministic segfaults.
	- name: GC store-site inventory
	run: \|
	python3 scripts/gc_store_site_inventory.py --self-test
	python3 scripts/gc_store_site_inventory.py

	# Handle-vs-pointer address classification audit: POINTER_TAG payloads
	# can be heap pointers OR small registry handles (fetch/zlib/proxy/...),
	# and code must classify by magnitude through value/addr_class.rs before
	# dereferencing. Catches new hand-typed band literals (0x100000 etc.)
	# and new `as *const GcHeader` casts outside addr_class.rs / gc/ before
	# they become Linux-only segfaults (#1843, #4004, #4665, #4800 class).
	# Allowlist: scripts/addr_class_allowlist.txt.
	- name: Address-classification audit
	run: \|
	python3 scripts/addr_class_inventory.py --self-test
	python3 scripts/addr_class_inventory.py

	# ---------------------------------------------------------------------------
	# API docs drift gate (#465)
	#
	# Regenerates `docs/src/api/reference.md` and `docs/api/perry.d.ts` from
	# the compile-time manifest in `crates/perry-api-manifest/src/entries.rs`,
	# then `git diff --exit-code`s the result. Fails when a code change updated
	# the manifest without re-committing the artifacts. Closes the "Release
	# workflow regenerates docs automatically (no drift from code)" criterion
	# — committing the diff is the easiest way to keep the docs in sync,
	# since the diff is reviewable in the PR that introduces it.
	# ---------------------------------------------------------------------------
	api-docs-drift:
	runs-on: ubuntu-latest
	timeout-minutes: 30
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Regenerate API docs
	run: ./scripts/regen_api_docs.sh

	- name: Check for drift
	run: \|
	if ! git diff --quiet -- docs/src/api/reference.md docs/api/perry.d.ts; then
	echo ""
	echo "::error::API docs drift detected. The compile-time manifest in"
	echo "::error::crates/perry-api-manifest/src/entries.rs changed but the"
	echo "::error::generated artifacts under docs/ weren't regenerated."
	echo ""
	echo "Fix by running:"
	echo " ./scripts/regen_api_docs.sh"
	echo " git add docs/src/api/reference.md docs/api/perry.d.ts"
	echo " git commit -m 'docs: regenerate API reference + .d.ts'"
	echo ""
	echo "Diff:"
	git --no-pager diff --stat -- docs/src/api/reference.md docs/api/perry.d.ts
	echo ""
	git --no-pager diff -- docs/src/api/reference.md docs/api/perry.d.ts \| head -200
	exit 1
	fi
	echo "✅ API docs match the manifest."

	# ---------------------------------------------------------------------------
	# Rust unit tests (266+ tests across all crates)
	# ---------------------------------------------------------------------------
	# Note: a separate `build` job that produced runtime/stdlib/compiler
	# artifacts USED to live here. It only fed `binary-size` (which now does
	# its own quick build) — every other job did `cargo build --release`
	# itself anyway, so `needs: build` was a serializing barrier with no
	# cache benefit. Removed in v0.5.387 along with the `actions/cache@v4`
	# blocks (replaced by `Swatinem/rust-cache@v2`, which handles target/
	# pruning intelligently and avoids the disk-pressure issue that
	# required the manual simulator-runtime wipe + cache=registry-only
	# workaround). Each downstream job below builds in parallel directly.
	cargo-test:
	# Was macos-14 — moved to ubuntu-latest in v0.5.392 to drop the 10×
	# billing weight. cargo-test's `--exclude` list already filters out
	# every macOS-specific UI crate (perry-ui-{ios,visionos,tvos,
	# watchos,gtk4,android,windows} per the comment block below), so
	# the platform-independent test set runs identically on Linux. The
	# macOS-host coverage we lose here is negligible — these tests
	# don't exercise any platform behavior; they're pure logic +
	# codegen.
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Run cargo test
	# Exclude UI backends that don't build on the ubuntu-latest CI image:
	# - perry-ui-macos / perry-ui-ios / perry-ui-tvos / perry-ui-watchos
	# / perry-ui-visionos: depend on `objc2` which only compiles on
	# Apple platforms (`compile_error!` in objc2/src/lib.rs:219).
	# - perry-ui-gtk4: needs system pango/gtk via pkg-config; runner
	# image doesn't have libgtk-4-dev installed by default.
	# - perry-ui-android: needs Android NDK.
	# - perry-ui-windows: needs win32 headers.
	# - perry-ui-windows-winui: re-exports perry-ui-windows, so it inherits
	# the same win32 / webview2-com dependency that won't build on Linux.
	env:
	# Rust's Ubuntu target can drive `cc` with `-fuse-ld=lld`; on the
	# shared runner this has repeatedly terminated large test links with
	# SIGBUS. Use the system linker for the cargo-test gate.
	CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS: "-C linker-features=-lld"
	# Keep test artifacts small enough for the shared runner disk. The
	# cargo-test job does not need line tables, and debug info was enough
	# to make later package archives hit ENOSPC after several package
	# test builds accumulated in target/debug.
	CARGO_PROFILE_TEST_DEBUG: "0"
	CARGO_PROFILE_DEV_DEBUG: "0"
	run: \|
	(
	while sleep 60; do
	echo "cargo-test still running at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
	done
	) &
	cargo_test_heartbeat_pid=$!
	trap 'kill "$cargo_test_heartbeat_pid" 2>/dev/null \|\| true' EXIT

	# #1444: perry-runtime's tests share process-global state — the
	# per-thread arena/GC, the timer queues, and the `NOTIFIED` flag are
	# process singletons. Running them across the default test-harness
	# thread pool lets one test's `js_notify_main_thread` / timer
	# scheduling perturb another's wait budget (the event_pump timing
	# flakes) and races the GC/threading tests into intermittent SIGSEGV.
	# Run perry-runtime single-threaded so the tests can't interfere.
	RUST_TEST_THREADS=1 cargo test -p perry-runtime
	find target/debug/deps -maxdepth 1 -type f -perm -111 ! -name '*.so' -delete
	# The remaining workspace includes large `perry` / `perry-stdlib`
	# test binaries. Keep Cargo build jobs serialized so the runner
	# does not link several of those large test binaries at once, then
	# run packages one at a time and prune linked test executables so
	# target/debug/deps does not exhaust the runner disk mid-job.
	export CARGO_BUILD_JOBS=1
	workspace_packages="$(
	cargo metadata --no-deps --format-version 1 \| python3 -c '
	import json
	import sys

	excluded = {
	"perry-runtime",
	"perry-ui-macos",
	"perry-ui-ios",
	"perry-ui-visionos",
	"perry-ui-tvos",
	"perry-ui-watchos",
	"perry-ui-gtk4",
	"perry-ui-android",
	"perry-ui-windows",
	"perry-ui-windows-winui",
	"perry-doc-fixture-my-bindings",
	}
	metadata = json.load(sys.stdin)
	workspace_members = set(metadata["workspace_members"])
	for package in metadata["packages"]:
	if package["id"] in workspace_members and package["name"] not in excluded:
	print(package["name"])
	'
	)"

	for package in $workspace_packages; do
	echo "::group::cargo test -p $package"
	cargo test -p "$package"
	echo "::endgroup::"
	cargo clean -p "$package" \|\| true
	find target/debug/deps -maxdepth 1 -type f -perm -111 ! -name '*.so' -delete
	done

	# ---------------------------------------------------------------------------
	# GC write-barrier stress (optional / non-blocking)
	#
	# `crates/perry/tests/gc_write_barrier_stress.rs` runs compiled binaries
	# under the slowest GC configuration (PERRY_GC_FORCE_EVACUATE +
	# PERRY_GC_VERIFY_EVACUATION) to hunt a rare corruption window (#5029).
	# Those tests are ~200s each and nondeterministic by nature, so they are a
	# poor fit for the blocking per-PR `cargo-test` gate (one flake blocked
	# every unrelated PR). They are `#[ignore]`d there and run here instead.
	#
	# Opt-in + informational: `continue-on-error` so a flake never fails the
	# workflow; triggered by the `run-extended-tests` PR label, a
	# `workflow_dispatch` with `run_extended_tests=true`, or a tag push.
	# ---------------------------------------------------------------------------
	gc-stress:
	continue-on-error: true
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	timeout-minutes: 30
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Install clang
	run: \|
	sudo apt-get update
	sudo apt-get install -y clang

	- name: Run GC write-barrier stress tests
	env:
	# Match the cargo-test gate's linker workaround (lld SIGBUS on the
	# shared runner during large test links).
	CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS: "-C linker-features=-lld"
	run: cargo test -p perry --test gc_write_barrier_stress -- --ignored

	# ---------------------------------------------------------------------------
	# Compiler-output regression gate
	#
	# Retains HIR, pre/post-opt LLVM IR, assembly, benchmark output, runtime
	# counters, vectorization remarks, benchmark timing summaries, and FP
	# contraction evidence for the primary CPU benchmark plus numeric fixtures.
	# Fails when hot-loop structural contracts regress.
	# ---------------------------------------------------------------------------
	compiler-output-regression:
	runs-on: ubuntu-latest
	timeout-minutes: 45
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Install clang
	run: \|
	sudo apt-get update
	sudo apt-get install -y clang

	- name: Build compiler
	run: cargo build -p perry

	- name: Run harness unit tests
	run: python3 -m unittest tests.test_compiler_output_regression

	- name: Gate native-region proof compiler output
	run: \|
	python3 scripts/compiler_output_regression.py suite \
	--suite native-region-proof \
	--perry target/debug/perry \
	--benchmark-mode smoke \
	--runs 1 \
	--perf-counters off \
	--print-summary

	- name: Gate positive vectorization compiler output
	run: \|
	python3 scripts/compiler_output_regression.py capture \
	--perry target/debug/perry \
	--workload vectorized_buffer_transform \
	--benchmark-mode smoke \
	--runs 1 \
	--perf-counters off \
	--gate \
	--print-summary

	- name: Gate HIR fact rewrite compiler output
	run: \|
	python3 scripts/compiler_output_regression.py capture \
	--perry target/debug/perry \
	--workload hir_fact_rewrite \
	--benchmark-mode smoke \
	--runs 1 \
	--perf-counters off \
	--gate \
	--print-summary

	- name: Gate FP contraction modes
	run: \|
	python3 scripts/compiler_output_regression.py capture \
	--perry target/debug/perry \
	--workload fma_contract \
	--benchmark-mode smoke \
	--runs 1 \
	--perf-counters off \
	--gate \
	--fp-contract=on \
	--clang-arg=-march=haswell \
	--expect-fma=on \
	--out-dir target/compiler-output-regression/fma_contract-fp-contract-on
	python3 scripts/compiler_output_regression.py capture \
	--perry target/debug/perry \
	--workload fma_contract \
	--benchmark-mode smoke \
	--runs 1 \
	--perf-counters off \
	--gate \
	--fast-math \
	--fp-contract=off \
	--clang-arg=-march=haswell \
	--expect-fma=off \
	--out-dir target/compiler-output-regression/fma_contract-fast-no-contract

	- name: Upload compiler-output artifacts
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: compiler-output-regression
	path: target/compiler-output-regression/

	# ---------------------------------------------------------------------------
	# Parity tests (Perry output vs Node.js)
	# ---------------------------------------------------------------------------
	parity:
	# Release-publish decoupling: aspirational extended suite. Per maintainer
	# decision it no longer BLOCKS package publishing — release-packages.yml's
	# await-tests gate keys on this workflow's run conclusion, and job-level
	# `continue-on-error: true` keeps a red result here from failing that
	# conclusion. The job still runs on every tag + shows its own pass/fail as
	# an informational signal (and core jobs — cargo-test/lint/api-docs-drift/
	# compiler-output-regression — still gate publish).
	continue-on-error: true
	# Was macos-14 — moved to ubuntu-latest in v0.5.392. Parity tests
	# just compare Perry's stdout against `node --experimental-strip-types`'s
	# stdout per test file; both run cleanly on Linux. `gtimeout` on
	# macOS is `timeout` on Linux (the run_parity_tests.sh wrapper
	# detects either). Node 22+ is installed via setup-node@v4 below.
	# 10× billing weight cut.
	#
	# v0.5.1018: gated to tag pushes only (`github.event_name == 'push'`).
	# The pull_request trigger above still fires the workflow on PRs for
	# the lint / cargo-test / api-docs-drift gates, but parity now only
	# runs on release tags. Direct main commits + PR cycle no longer pay
	# the ~20 min parity bill; release tagging still catches regressions
	# before publish (release-packages.yml await-tests gate waits on
	# this job by name for tag events).
	#
	# Opt-in: apply the `run-extended-tests` label to a PR, or dispatch
	# the workflow manually with `run_extended_tests=true`, to run this
	# job on demand. PR authors and maintainers can both apply labels.
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Setup Node.js
	uses: actions/setup-node@v6
	with:
	node-version: '22'

	- name: Build compiler
	run: cargo build --release

	- name: Run parity tests
	run: ./run_parity_tests.sh

	- name: Check parity threshold
	run: \|
	set +e
	python3 scripts/parity_threshold_gate.py \
	--check \
	--output-json test-parity/reports/parity_threshold_latest.json \
	--output-md test-parity/reports/parity_threshold_latest.md
	status=$?
	set -e
	cat test-parity/reports/parity_threshold_latest.md >> "$GITHUB_STEP_SUMMARY"
	exit "$status"

	- name: Check for new failures
	run: \|
	REPORT="test-parity/reports/latest.json"
	KNOWN="test-parity/known_failures.json"

	# Filter empty strings — `run_parity_tests.sh` emits `compile: [""]`
	# when there are zero compile failures (a printf+sed quirk in the
	# JSON generator), and that empty entry would propagate to a
	# spurious "NEW FAILURES: - " line and fail this gate.
	jq -r '(.failures.parity // []) + (.failures.compile // []) \| .[] \| select(. != "")' "$REPORT" \| sort -u > /tmp/all_fails.txt
	if [[ -f "$KNOWN" ]]; then
	# Issue #797 — known_failures.json moved from flat strings to
	# structured records. Keep the audit metadata (the `_schema`
	# key at the top of the file) out of the test-name set so
	# CI doesn't try to match a real test against it.
	jq -r 'keys[] \| select(. != "_schema")' "$KNOWN" \| sort -u > /tmp/known.txt

	# Schema sanity check — every non-_schema entry must be an
	# object with non-empty `category` and `reason`. Fails the
	# build on malformed entries so the format doesn't silently
	# drift back to the legacy flat-string shape.
	bad="$(jq -r 'to_entries \| map(select(.key != "_schema")) \| .[] \| select((.value \| type) != "object" or (.value.category // "") == "" or (.value.reason // "") == "") \| .key' "$KNOWN")"
	if [[ -n "$bad" ]]; then
	echo "Malformed known_failures.json entries (missing category/reason or not an object):"
	echo "$bad" \| sed 's/^/ - /'
	exit 1
	fi
	else
	: > /tmp/known.txt
	fi

	TOTAL=$(wc -l < /tmp/all_fails.txt \| tr -d ' ')
	comm -23 /tmp/all_fails.txt /tmp/known.txt > /tmp/new.txt
	if [[ -s /tmp/new.txt ]]; then
	echo "NEW FAILURES (not in known_failures.json):"
	sed 's/^/ - /' /tmp/new.txt
	exit 1
	fi
	echo "All ${TOTAL} failures are known/triaged."

	- name: Generate parity matrix trend
	run: \|
	python3 scripts/parity_matrix_trend.py \
	--check \
	--output-json test-parity/reports/parity_matrix_latest.json \
	--output-md test-parity/reports/parity_matrix_latest.md
	cat test-parity/reports/parity_matrix_latest.md >> "$GITHUB_STEP_SUMMARY"

	- name: Upload parity report
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: parity-report
	path: \|
	test-parity/reports/latest.json
	test-parity/reports/parity_threshold_latest.json
	test-parity/reports/parity_threshold_latest.md
	test-parity/reports/parity_matrix_latest.json
	test-parity/reports/parity_matrix_latest.md

	# Capture per-test compile stderr written by run_parity_tests.sh into
	# `test-parity/output/*.compile_error.log` so the long-tail
	# macOS-14-only compile failures (tracked as `ci-env` in
	# known_failures.json) can finally be diagnosed by reading the actual
	# error message rather than inferring from the test family.
	- name: Upload compile-error logs
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: parity-compile-errors-${{ runner.os }}
	path: test-parity/output/*.compile_error.log
	if-no-files-found: ignore

	# ---------------------------------------------------------------------------
	# Compile smoke test (all 130+ test files must compile)
	# ---------------------------------------------------------------------------
	compile-smoke:
	# Release-publish decoupling (see `parity` above): aspirational extended
	# suite, informational only — does not block package publishing.
	continue-on-error: true
	# Was macos-14 — moved to ubuntu-latest in v0.5.392. The smoke
	# compiles every `test-files/*.ts` with the bare `perry foo.ts -o
	# out` path; the auto-optimize cache + clang link steps work
	# identically on Linux. The v0.5.385 sha256 sidecar is portable
	# via `command -v sha256sum \|\| shasum -a 256`. `xargs -P` is
	# GNU on Linux (the macos-14 BSD xargs we tuned for behaves the
	# same for our usage). Linux runners are 4-vCPU vs macos-14's 3,
	# so could try NJOBS=4, but keeping NJOBS=3 + retry conservatively
	# for the cargo auto-optimize race (issue tracked separately).
	# 10× billing weight cut.
	#
	# v0.5.1018: gated to tag pushes only (`github.event_name == 'push'`).
	# See the parity job comment above for rationale — release-packages.yml
	# still requires this job on tag events before publishing.
	#
	# Opt-in: `run-extended-tests` PR label or `workflow_dispatch` with
	# `run_extended_tests=true` runs this job on demand.
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Build compiler
	run: cargo build --release

	- name: Issue #945 scalar method IR guard
	run: \|
	PERRY_BIN="$PWD/target/release/perry" \
	scripts/run_issue_945_scalar_method_ir_guard.sh

	- name: Compile all test files
	run: \|
	set -uo pipefail
	export PERRY="$PWD/target/release/perry"
	export LOGS_DIR="/tmp/perry_smoke_logs"

	# Tests that are known to not compile cleanly under the bare
	# `perry foo.ts -o out` smoke path. Sources:
	# - test_ui_*: need `--target macos` / `ios-simulator` to pull
	# in libperry_ui_*; the no-target compile path doesn't link
	# the platform widgets.
	# - test_timer: hangs on the runtime event loop under the
	# no-arg compile path.
	#
	# The 11-entry Buffer/typed-array `ci-env` skip family that
	# previously lived here (test_gap_buffer_ops, test_buffer_*,
	# test_inline_uint8array_param, test_issue_167_*,
	# test_issue_227_*, test_gap_fetch_response,
	# test_issue_234_blob_methods, etc.) has been removed as of
	# PR #239: the actual root cause was a missing
	# `module.declare_function("llvm.assume", VOID, &[I1])` in
	# `crates/perry-codegen/src/runtime_decls.rs`. Apple Clang
	# ≥21 (Xcode 26 / local) auto-recognised the intrinsic;
	# Apple Clang 15 (macos-14 runner / Xcode 15.x / LLVM 17)
	# errored with `error: use of undefined value '@llvm.assume'`.
	# Space-separated skip list (associative arrays require bash
	# 4+; macOS-14 ships bash 3.2). Word-boundary match keeps the
	# substring lookup safe.
	# test_phase2v3_3_show_toast_set_text imports `setText` and
	# `showToast` from perry/ui. The cross-platform stubs in
	# perry-runtime/src/ui_text_registry.rs ARE meant to make the
	# bare-target compile path work (no `--target` flag → host
	# build), but the macOS doc-tests + this Linux compile-smoke
	# both still fail at link time because the runtime registers
	# the macOS-side handler in `perry-ui-macos/src/app_run.rs`,
	# which is only linked when `needs_ui = true`. Without
	# `--target macos`, perry-ui-macos isn't pulled in, and the
	# cross-platform stubs route to a NULL handler → undefined
	# symbol. The fix is a separate coordination work item with
	# the v3.3 toast/setText worker (PR #322 family); skip-listing
	# here unblocks the v9 CI smoke landing without papering over
	# the underlying gap.
	# test_ramda_user_import intentionally imports a user package
	# (`ramda`) through the V8 fallback path. The compile-smoke
	# runner does not npm-install optional package fixtures, so the
	# strict unresolved-namespace diagnostic is expected here.
	# test_take_screenshot imports perry/ui (same `--target gtk4`
	# / pre-built libperry_ui_gtk4.a coupling as the test_ui_*
	# family above — bare-target compile path doesn't link the
	# platform widgets on Linux).
	# test_issue_842_side_effect_dynamic_import compiles a barrel
	# file that dynamic-imports a sibling helper; the smoke
	# harness compiles each .ts standalone with `perry foo.ts -o
	# out`, so the helper .o never gets produced and ld fails on
	# the unresolved symbol. The test belongs in a multi-file
	# integration runner, not the per-file smoke pass.
	# test_jose_signverify_roundtrip references `jwtVerify` from
	# the jose ext (#1025 sibling work). The bare smoke compile
	# path doesn't link the jose-specific runtime symbol — same
	# ld-unresolved-reference failure observed on #1038 pre-merge.
	# Move to a jose-aware test runner once that crate's CI hook
	# is wired up.
	# test_ui_on_keydown_smoke / test_issue_1495_image_systemname /
	# test_issue_1867_audio_playback / test_issue_2022_canvas_draw_image
	# all import `perry/ui` (App/Image/Canvas/loadImage/keydown). Like the
	# rest of the test_ui_* / media family above they need `--target
	# macos` to link the platform widgets; the bare `perry foo.ts -o out`
	# smoke path doesn't pull in libperry_ui_* on Linux, so they fail with
	# ld undefined-symbol errors. ci-env, not a Perry codegen bug.
	export SKIP_TESTS=" \
	test_ui_comprehensive \
	test_ui_controls \
	test_ui_phase4 \
	test_ui_on_keydown_smoke \
	test_issue_1495_image_systemname \
	test_issue_1867_audio_playback \
	test_issue_2022_canvas_draw_image \
	test_timer \
	test_phase2v3_3_show_toast_set_text \
	test_issue_351_media_playback \
	test_issue_442_inline_button_bg \
	test_issue_538_background_tasks \
	test_issue_553_mobile_widgets \
	test_issue_556_table_array \
	test_issue_556_table_concat \
	test_issue_610_foreach \
	test_issue_610_smoke \
	test_issue_640_navstack_textfield \
	test_issue_763_reactive_textfield \
	test_issue_764_state_at_module_init \
	test_ramda_user_import \
	test_take_screenshot \
	test_issue_842_side_effect_dynamic_import \
	test_jose_signverify_roundtrip \
	test_parity_assert \
	test_parity_async_hooks \
	test_parity_buffer \
	test_parity_child_process \
	test_parity_cluster \
	test_parity_crypto \
	test_parity_dgram \
	test_parity_diagnostics_channel \
	test_parity_decimal \
	test_parity_dns \
	test_parity_dns_promises \
	test_parity_dotenv \
	test_parity_events \
	test_parity_fs \
	test_parity_fs_promises \
	test_parity_http \
	test_parity_http2 \
	test_parity_https \
	test_parity_lodash \
	test_parity_module \
	test_parity_moment \
	test_parity_net \
	test_parity_path \
	test_parity_perf_hooks \
	test_parity_process \
	test_parity_querystring \
	test_parity_readline \
	test_parity_readline_promises \
	test_parity_stream \
	test_parity_stream_consumers \
	test_parity_stream_promises \
	test_parity_stream_web \
	test_parity_sys \
	test_parity_test \
	test_parity_timers \
	test_parity_timers_promises \
	test_parity_tls \
	test_parity_url \
	test_parity_util \
	test_parity_validator \
	test_parity_worker_threads \
	test_parity_zlib "

	rm -rf "$LOGS_DIR"
	mkdir -p "$LOGS_DIR"

	# Worker function. Each test owns a unique marker filename
	# (.pass / .fail / .skip) under $LOGS_DIR, so concurrent
	# workers never race on the same file. Counts + failure list
	# are aggregated below AFTER all workers finish — no shared
	# bash-counter state crosses subshell boundaries. Per-test
	# stderr is captured to $LOGS_DIR/<name>.compile_error.log so
	# the artifact upload step preserves the actual error
	# messages (long-tail macOS-14 codegen failures are tracked
	# as `ci-env` in test-parity/known_failures.json and are
	# otherwise diagnosed by inference, not data).
	compile_one() {
	local f="$1"
	[[ -d "$f" ]] && return 0
	local name
	name=$(basename "$f" .ts)
	if [[ "$SKIP_TESTS" == " $name " ]]; then
	: > "$LOGS_DIR/${name}.skip"
	return 0
	fi
	local err_log="$LOGS_DIR/${name}.compile_error.log"
	# Try once. If perry compile fails, sleep 2s then retry.
	# The xargs -P parallel pass can race when two workers both
	# need an auto-optimize rebuild of the same feature combo —
	# the loser's clang sees a momentarily-missing
	# `target/perry-auto-<hash>/release/libperry_runtime.a` and
	# bails with `errno=2`. Race window is sub-second so a
	# single retry after a brief delay is the cheapest fix
	# without serializing the workers entirely. `\|\|` short-
	# circuits on first success — the success path is unchanged.
	try_compile() {
	"$PERRY" "$f" -o "/tmp/perry_smoke_${name}" 2>"$err_log"
	}
	try_compile && status=0 \|\| status=$?
	if [[ $status -ne 0 ]]; then
	# Retry-on-race semantics: the xargs -P parallel pass can
	# race when two workers both need an auto-optimize rebuild
	# of the same feature combo — the loser's clang sees a
	# momentarily-missing libperry_runtime.a and bails. A single
	# retry after a brief delay is the cheapest fix.
	sleep 2
	try_compile && status=0 \|\| status=$?
	fi
	if [[ $status -eq 0 ]]; then
	: > "$LOGS_DIR/${name}.pass"
	rm -f "/tmp/perry_smoke_${name}" "$err_log"
	else
	: > "$LOGS_DIR/${name}.fail"
	fi
	}
	export -f compile_one

	# ubuntu-latest runners have 4 vCPUs; perry compile is
	# CPU-bound for HIR/codegen but waits on the linker (clang)
	# for a meaningful chunk of each test. v0.5.384 dropped to
	# NJOBS=3 after NJOBS=6 hit a cargo auto-optimize file-lock
	# race: two workers rebuilding the same `target/perry-auto-
	# <hash>/lib*.a` led to one worker's clang seeing `errno=2`
	# mid-link. v0.5.429 closed that race at the source via an
	# OS file lock in `commands/compile/optimized_libs.rs::
	# build_optimized_libs` (fslock dep — flock on Unix,
	# LockFileEx on Windows; serializes per-hash, parallel across
	# different hashes). NJOBS=6 is now safe again. Sequential
	# baseline was ~26 min; NJOBS=3 → ~10-12 min; NJOBS=6 → ~6-8
	# min on a 4-vCPU runner. The retry-once in compile_one stays
	# as a belt-and-suspenders safety net for any remaining race
	# corner the lock doesn't catch.
	NJOBS="${PERRY_SMOKE_JOBS:-6}"
	printf '%s\n' test-files/*.ts \
	\| xargs -P "$NJOBS" -n 1 -I{} bash -c 'compile_one "$@"' _ {}

	# Count markers via shopt nullglob + bash array length. Pre-fix
	# we used `ls -1 "$LOGS_DIR"/*.fail \| wc -l` which fails when
	# no matches exist (`ls` exits 1 on missing files), and with
	# GH Actions' default `bash -eo pipefail` the failed pipe
	# propagates errexit and kills the script BEFORE printing the
	# summary line — so a clean run with zero failures still made
	# compile-smoke exit 1 (PR #285 / v0.5.379 introduced this).
	# nullglob makes an empty glob expand to nothing instead of
	# the literal pattern, so the array length is correctly 0.
	shopt -s nullglob
	pass_files=("$LOGS_DIR"/*.pass)
	fail_files=("$LOGS_DIR"/*.fail)
	skip_files=("$LOGS_DIR"/*.skip)
	PASS=${#pass_files[@]}
	FAIL=${#fail_files[@]}
	SKIP=${#skip_files[@]}
	echo "Compile smoke: $PASS passed, $FAIL failed, $SKIP skipped"

	if [[ $FAIL -gt 0 ]]; then
	echo "Compile failures:"
	for marker in "$LOGS_DIR"/*.fail; do
	[[ -e "$marker" ]] \|\| continue
	name=$(basename "$marker" .fail)
	echo " - $name"
	# Surface the head of each failure log directly in the job
	# output so a quick scan reveals the underlying error
	# without downloading the artifact.
	err_log="$LOGS_DIR/${name}.compile_error.log"
	if [[ -s "$err_log" ]]; then
	echo " --- compile stderr (first 30 lines) ---"
	head -n 30 "$err_log" \| sed 's/^/ /'
	echo " --- end ---"
	fi
	done
	exit 1
	fi

	- name: Upload compile-smoke error logs
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: compile-smoke-error-logs
	path: /tmp/perry_smoke_logs/*.compile_error.log
	if-no-files-found: ignore

	# Thread-primitive compile-error tests (#146): checks that closures
	# passed to perry/thread primitives with outer-variable writes are
	# rejected. The runtime thread_primitives.ts example is covered by
	# the doc-tests job below.
	- name: Thread-primitive compile-error tests
	run: ./scripts/run_thread_tests.sh

	# perry/ui styling-matrix CI gate (Phase A of issue #185): verifies
	# crates/perry-ui/src/styling_matrix.rs is in sync with every backend's
	# lib.rs FFI exports, regenerates docs/src/ui/styling-matrix.md, then
	# `git diff --exit-code` catches a forgotten-to-commit regeneration.
	# Drift fails CI loudly so a future FFI add/remove can't silently
	# land without a matrix update.
	- name: UI styling matrix
	run: \|
	./scripts/run_ui_styling_matrix.sh
	git diff --exit-code -- docs/src/ui/styling-matrix.md \
	\|\| (echo "docs/src/ui/styling-matrix.md regenerated; commit the diff" && exit 1)

	# Visual styling test ↔ spec consistency (#185 follow-up).
	# `docs/examples/ui/styling/visual_test.ts` is the canonical
	# comprehensive visual test app; `visual_test.spec.md` documents
	# each cell's expected visible signature for human/LLM-aided
	# screenshot verification. The two files must stay in lockstep
	# — adding a row to the .ts without updating the spec silently
	# breaks the verification flow. The actual cross-platform
	# compile-test of visual_test.ts is handled by the existing
	# run_doc_tests.sh loop downstream.
	- name: Visual styling test ↔ spec consistency
	run: ./scripts/run_visual_test_check.sh

	# Fastify end-to-end integration (#174): launches a Perry-compiled
	# Fastify server as a background process, curls four routes covering
	# simple GET, path params, POST with JSON body + reply.code(), and
	# 404 fallback. The docs Fastify example is marked no-test because
	# app.listen() blocks forever; this script provides the coverage
	# that the no-test tag would otherwise hide.
	- name: Fastify integration tests
	run: ./scripts/run_fastify_tests.sh

	# Memory-stability regression suite. Two failure modes microbenchs
	# don't catch: (1) slow RSS accumulation across 100k-200k iterations
	# of allocate-and-discard (would catch a future block-pinning /
	# cache-leak / tenuring-trap regression in the gen-GC work), and
	# (2) crashes when gc() is forced aggressively during JSON parse,
	# deep recursion, or closure init. Each test runs under default,
	# PERRY_GEN_GC=1, and PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1 so a
	# regression in any GC mode is caught. Linux-only because /usr/bin/time
	# availability + RSS reporting differs on Windows runners.
	- name: Memory stability tests
	if: runner.os == 'Linux' \|\| runner.os == 'macOS'
	env:
	PERRY_GC_EVIDENCE_DIR: ${{ runner.temp }}/gc-evidence
	run: ./scripts/run_memory_stability_tests.sh

	- name: Upload GC evidence artifacts
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: gc-evidence-${{ runner.os }}
	path: ${{ runner.temp }}/gc-evidence
	if-no-files-found: ignore

	# ---------------------------------------------------------------------------
	# HarmonyOS ArkUI codegen smoke (Phase 2 v9).
	#
	# The harmonyos compile path produces a 3-part output: the .so (LLVM
	# codegen, just like every other backend), the ArkUI Index.ets (emitted
	# by perry-codegen-arkts from the harvested perry/ui App({...}) call),
	# and the NAPI bridge declarations. End-to-end `perry compile --target
	# harmonyos` requires the OpenHarmony SDK (clang + musl sysroot, ~600
	# MB) which isn't pre-installed on ubuntu-latest runners and isn't
	# worth downloading every CI run.
	#
	# What CI CAN cover without the SDK is the codegen-side ArkUI emission
	# — the part that's most likely to regress as Phase 2 widgets evolve.
	# `crates/perry-codegen-arkts/tests/phase2_full_app_smoke.rs` is the
	# comprehensive integration test: constructs a single Module that uses
	# every Phase 2 widget shape (state<T>, Tabs, Menu, Grid, LazyVStack
	# with .map, ForEach via array.map, inline style: { } with animation/
	# shadow/textDecoration, @app.media image, Toggle/TextField/Slider with
	# multi-arg invokeCallback1 closures) and asserts the emitted Index.ets
	# contains every canonical pattern v2-v13 added.
	#
	# Discrete job (separate from cargo-test) so a regression in just one
	# widget surfaces as one red cell, not buried in the workspace test
	# output. cargo-test ALSO runs these tests as part of `cargo test
	# --workspace` — this job is the named visibility, not a duplicate run.
	#
	# Linker-side validation is covered by manual on-device runs against
	# DevEco Studio's Pura 90 Pro Max emulator (see CLAUDE.md v0.5.399+
	# entries for the workflow).
	# ---------------------------------------------------------------------------
	harmonyos-smoke:
	# Aspirational smoke (informational) — must not block package publish.
	# release-packages await-tests keys on this workflow's run conclusion;
	# continue-on-error keeps a red result here from failing it (same as
	# parity/compile-smoke/doc-tests/drizzle/effect-basic-smoke). Core jobs
	# (cargo-test/lint/api-docs-drift/compiler-output-regression) still gate.
	continue-on-error: true
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Run perry-codegen-arkts unit tests
	run: cargo test -p perry-codegen-arkts --release --lib

	- name: Run Phase 2 full-app integration smoke
	run: \|
	cargo test -p perry-codegen-arkts \
	--release \
	--test phase2_full_app_smoke \
	-- --nocapture

	# ---------------------------------------------------------------------------
	# drizzle-mysql smoke: runs the tier-3 release fixture
	# `tests/release/packages/drizzle-mysql/` against a real MySQL — the CI
	# counterpart of #489's local acceptance, closing #804.
	#
	# The fixture itself stays Docker-free (per the tier-3 "no Docker"
	# preference in scripts/release_sweep_tiers/tier03_real_packages.sh —
	# locally it skips when no mysqld is reachable on 127.0.0.1:3306). The
	# `services: mysql:8` block below is runner-side setup, transparent to
	# the fixture. `MYSQL_ALLOW_EMPTY_PASSWORD` matches the fixture's
	# root-no-password convention; `MYSQL_DATABASE` auto-creates the test
	# DB on init so the fixture's CREATE-IF-NOT-EXISTS is a no-op.
	#
	# Gated to tag pushes + opt-in (parity with compile-smoke / doc-tests
	# / parity). Real-DB setup + perry release build + drizzle +
	# @perryts/mysql compile is ~15 min wall — PRs shouldn't pay it by
	# default. Opt-in via the `run-extended-tests` label.
	# ---------------------------------------------------------------------------
	drizzle-mysql-smoke:
	# Release-publish decoupling (see `parity` above): aspirational extended
	# suite, informational only — does not block package publishing.
	continue-on-error: true
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	services:
	mysql:
	image: mysql:8
	env:
	MYSQL_ALLOW_EMPTY_PASSWORD: "yes"
	MYSQL_DATABASE: perry_drizzle_test
	ports:
	- 3306:3306
	options: >-
	--health-cmd="mysqladmin ping -h 127.0.0.1 -P 3306 --silent"
	--health-interval=5s
	--health-timeout=3s
	--health-retries=20
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Setup Node.js
	uses: actions/setup-node@v6
	with:
	node-version: '22'

	- name: Install mysql client (for fixture health probe)
	run: \|
	sudo apt-get update -qq
	sudo apt-get install -y -qq mysql-client

	- name: Wait for MySQL service to accept connections
	run: \|
	set -e
	for i in $(seq 1 30); do
	if mysql -h 127.0.0.1 -P 3306 -u root -e "SELECT 1" >/dev/null 2>&1; then
	echo "mysql ready after $i attempts"
	break
	fi
	sleep 1
	done
	mysql -h 127.0.0.1 -P 3306 -u root -e "SELECT VERSION();"

	- name: Build perry compiler
	run: cargo build --release -p perry-runtime -p perry-stdlib -p perry

	- name: Run drizzle-mysql fixture
	run: \|
	cd tests/release/packages/drizzle-mysql
	PERRY_BIN="$GITHUB_WORKSPACE/target/release/perry" bash fixture.sh

	# ---------------------------------------------------------------------------
	# ink-link-smoke: runs the tier-3 release fixture
	# `tests/release/packages/ink-link-smoke/` as the CI counterpart to #803.
	#
	# This is intentionally compile/link-only. #348 tracks broader Ink runtime
	# and rendering compatibility; this job guards the package-graph +
	# compilePackages linker contract that the fixture documents.
	#
	# Gated to tag pushes + opt-in (parity with drizzle-mysql-smoke /
	# compile-smoke / doc-tests). The fixture installs Ink + React and builds a
	# release Perry compiler, so PRs shouldn't pay it by default. Opt-in via the
	# `run-extended-tests` label or workflow dispatch.
	# ---------------------------------------------------------------------------
	ink-link-smoke:
	# Aspirational smoke (informational) — see harmonyos-smoke. Does not block publish.
	continue-on-error: true
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Setup Node.js
	uses: actions/setup-node@v6
	with:
	node-version: '22'

	- name: Build perry compiler
	run: cargo build --release -p perry-runtime -p perry-stdlib -p perry

	- name: Run Ink link fixture
	run: \|
	cd tests/release/packages/ink-link-smoke
	PERRY_BIN="$GITHUB_WORKSPACE/target/release/perry" bash fixture.sh

	- name: Upload Ink fixture logs
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: ink-link-smoke-logs
	path: \|
	tests/release/packages/ink-link-smoke/install.log
	tests/release/packages/ink-link-smoke/perry-compile.log
	if-no-files-found: ignore

	# ---------------------------------------------------------------------------
	# effect-basic-smoke: runs the tier-3 release fixture
	# `tests/release/packages/effect-basic/` as the CI counterpart to #802.
	#
	# This is intentionally advisory: #802 asks for a live Effect compile/run
	# signal even while broader Effect end-to-end compatibility remains in
	# progress. Gated to tag pushes + opt-in, matching the named package smokes.
	# ---------------------------------------------------------------------------
	effect-basic-smoke:
	continue-on-error: true
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Setup Node.js
	uses: actions/setup-node@v6
	with:
	node-version: '22'

	- name: Build perry compiler
	run: cargo build --release -p perry-runtime -p perry-stdlib -p perry

	- name: Run Effect fixture
	run: \|
	cd tests/release/packages/effect-basic
	PERRY_EFFECT_BASIC_ADVISORY=1 PERRY_BIN="$GITHUB_WORKSPACE/target/release/perry" bash fixture.sh

	- name: Upload Effect fixture logs
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: effect-basic-smoke-logs
	path: \|
	tests/release/packages/effect-basic/install.log
	tests/release/packages/effect-basic/perry-compile.log
	tests/release/packages/effect-basic/perry-run.log
	tests/release/packages/effect-basic/perry-out.txt
	tests/release/packages/effect-basic/diff.log
	if-no-files-found: ignore

	# ---------------------------------------------------------------------------
	# Doc-example tests: compile + run every .ts under docs/examples/.
	# UI examples launch with PERRY_UI_TEST_MODE=1 so they auto-exit after one
	# frame. Gallery screenshots are diffed against per-OS baselines (advisory
	# until Linux/Windows baselines stabilize).
	#
	# Gated to tag pushes + opt-in (parity with parity/compile-smoke). The
	# macOS-14 matrix entry takes ~30 min wall and dominates the PR feedback
	# loop, so PRs no longer pay the bill by default. Release tags still run
	# doc-tests as part of the release-packages.yml gate.
	#
	# Opt-in: apply the `run-extended-tests` label to a PR, or dispatch the
	# workflow manually with `run_extended_tests=true`, to run this job on
	# demand. PR authors and maintainers can both apply labels.
	# ---------------------------------------------------------------------------
	doc-tests:
	# Release-publish decoupling (see `parity` above): aspirational extended
	# suite, informational only — does not block package publishing.
	continue-on-error: true
	if: >-
	github.event_name == 'push' \|\|
	(github.event_name == 'workflow_dispatch' && inputs.run_extended_tests) \|\|
	(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-extended-tests'))
	strategy:
	fail-fast: false
	matrix:
	include:
	- os: macos-14
	ui_backend: perry-ui-macos
	shell: bash
	# stdlib/http/snippets.ts excluded since v0.5.886: it links
	# against js_axios_response_data_parsed +
	# js_node_http2_create_secure_server which live in
	# perry-ext-axios / perry-ext-http-server. v0.5.885's
	# PERRY_NO_AUTO_OPTIMIZE skips the well-known-binding probe
	# that would route those .a files into the link surface.
	# Proper fix: hoist well-known-binding lookup out of
	# build_optimized_libs into link.rs so it runs even when
	# auto-optimize is skipped. Tracked separately.
	cmd_exclude_gallery: "./scripts/run_doc_tests.sh --verbose --skip-xcompile --filter-exclude ui/gallery.ts --filter-exclude stdlib/http/snippets.ts"
	cmd_gallery: "./scripts/run_doc_tests.sh --verbose --skip-xcompile --filter ui/gallery.ts"
	# Repeat `--xcompile-only-target=…` per target rather than a
	# single comma-delimited value because PowerShell splits even
	# `--foo=a,b` at the comma when unquoted (array literal).
	# Repetition sidesteps the whole issue on every shell.
	#
	# web + wasm cross-compile dropped from the macOS blocking gate
	# in v0.5.429 — both targets are portable and the ubuntu-24.04
	# matrix entry below already verifies them at 1× billing
	# weight. Keep ios-simulator here because it requires Apple
	# SDK that only macos-14 has.
	cmd_xcompile_blocking: "./scripts/run_doc_tests.sh --verbose --xcompile-only --xcompile-only-target=ios-simulator"
	cmd_xcompile_advisory: "./scripts/run_doc_tests.sh --verbose --xcompile-only"
	# Baseline captured from a clean CI run of 24723671119 (900x970).
	gallery_advisory: false
	# ubuntu-24.04 / perry-ui-gtk4 doc-tests re-disabled v0.5.873:
	# 39 of 88 tests TIMEOUT at the 15s execution budget. The gtk4
	# `glib::timeout_add_local_once` → `app.quit()` exit path
	# doesn't terminate the main loop cleanly under xvfb-run, so
	# PERRY_UI_TEST_MODE-driven self-exit never fires and the
	# harness has to SIGKILL each binary. Separate bug from the
	# webkit6 / soup / ed25519 fixes that v0.5.864→0.5.871 closed.
	# Tracked separately. Re-enable when the gtk4 testkit exit
	# path is fixed.
	# - os: ubuntu-24.04
	# ui_backend: perry-ui-gtk4
	# shell: bash
	# cmd_exclude_gallery: "xvfb-run -a ./scripts/run_doc_tests.sh --verbose --skip-xcompile --filter-exclude ui/gallery.ts"
	# cmd_gallery: "xvfb-run -a ./scripts/run_doc_tests.sh --verbose --skip-xcompile --filter ui/gallery.ts"
	# cmd_xcompile_blocking: "./scripts/run_doc_tests.sh --verbose --xcompile-only --xcompile-only-target=web --xcompile-only-target=wasm --xcompile-only-target=ios-simulator"
	# cmd_xcompile_advisory: "./scripts/run_doc_tests.sh --verbose --xcompile-only"
	# gallery_advisory: false
	# windows-2022 doc-tests for perry-ui-windows temporarily disabled:
	# 30+ doc-test snippets COMPILE_FAIL on the Windows runner with
	# perry exit 1 (mix of platform-banner gaps and #463-unimplemented
	# node:* surface). Tracked separately; re-enable once the snippet
	# set + perry-ui-windows compile paths are reconciled. macOS
	# doc-tests still gate the release.
	# - os: windows-2022
	# ui_backend: perry-ui-windows
	# shell: pwsh
	# cmd_exclude_gallery: "./scripts/run_doc_tests.ps1 --verbose --skip-xcompile --filter-exclude ui/gallery.ts"
	# cmd_gallery: "./scripts/run_doc_tests.ps1 --verbose --skip-xcompile --filter ui/gallery.ts"
	# cmd_xcompile_blocking: "./scripts/run_doc_tests.ps1 --verbose --xcompile-only --xcompile-only-target=web --xcompile-only-target=wasm --xcompile-only-target=ios-simulator"
	# cmd_xcompile_advisory: "./scripts/run_doc_tests.ps1 --verbose --xcompile-only"
	# gallery_advisory: false
	runs-on: ${{ matrix.os }}
	defaults:
	run:
	shell: ${{ matrix.shell }}
	steps:
	- uses: actions/checkout@v6

	# macos-14 ships with ~14 GB free disk after the preinstalled Xcode +
	# iOS/tvOS/watchOS simulator runtime images. Several `cargo build
	# --release` jobs in this workflow consistently OOM'd at the cache
	# restore step (`No space left on device` from the runner's own
	# diagnostic writer, before cargo even started). Wiping the simulator
	# runtime IMAGES — not the SDKs — reclaims ~15-25 GB without
	# affecting cross-compile to `aarch64-apple-ios-sim` (that only needs
	# the SDK, which lives inside the active Xcode app).
	- name: Free up disk space (macOS)
	if: runner.os == 'macOS'
	run: \|
	BEFORE=$(df -h / \| tail -1 \| awk '{print $4}')
	sudo rm -rf /Library/Developer/CoreSimulator/Profiles/Runtimes/Simulator \|\| true
	sudo rm -rf ~/Library/Developer/CoreSimulator/Caches/* \|\| true
	AFTER=$(df -h / \| tail -1 \| awk '{print $4}')
	echo "Disk free: ${BEFORE} -> ${AFTER}"

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- name: Set up MSVC environment (Windows)
	# Populates LIB / INCLUDE / PATH so link.exe can find user32.lib
	# etc. Without this, runner's MSVC install exists on disk but the
	# shell session has no clue where, and perry's LNK1181 fatal
	# errors looking for Windows SDK libs.
	if: matrix.os == 'windows-2022'
	uses: ilammy/msvc-dev-cmd@v1

	- name: Install GTK4 + GStreamer + Xvfb + PulseAudio headers (Linux)
	if: matrix.os == 'ubuntu-24.04'
	run: \|
	sudo apt-get update
	# libgstreamer1.0-dev + libgstreamer-plugins-base1.0-dev added in
	# v0.5.442 for PR #371 (perry/media streaming playback, #351).
	# gstreamer-sys's build script needs `gstreamer-1.0.pc` findable
	# via pkg-config; without these two packages doc-tests-gtk4 fails
	# at the cargo build step with "Package gstreamer-1.0 was not
	# found in the pkg-config search path". gstreamer-base is the
	# transitive dep gstreamer-base-sys needs for the playbin element
	# that perry/media wraps.
	# libwebkitgtk-6.0-dev added for the perry/ui-gtk4 WebView
	# feature (Phases 1-5 + v2 follow-ups, #658): the `webkit6` crate
	# (0.4 series) and its transitive `javascriptcore6-sys` need
	# `webkitgtk-6.0.pc` AND `javascriptcoregtk-6.0.pc` findable via
	# pkg-config. Ubuntu 24.04 (noble) ships libwebkitgtk-6.0-dev
	# which provides BOTH .pc files (the old libwebkit2gtk-4.1-dev
	# name only ships the 4.1 .pc and that's not what webkit6 wants).
	sudo apt-get install -y \
	libgtk-4-dev libadwaita-1-dev xvfb pkg-config \
	libpulse-dev \
	libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev \
	libwebkitgtk-6.0-dev libshumate-dev

	- name: Surface Android NDK location (for cross-compile)
	if: matrix.os == 'macos-14' \|\| matrix.os == 'ubuntu-24.04'
	run: \|
	# 1. Discover NDK location on the runner.
	NDK=""
	if [ -n "$ANDROID_NDK_HOME" ]; then
	NDK="$ANDROID_NDK_HOME"
	elif [ -d "$ANDROID_HOME/ndk-bundle" ]; then
	NDK="$ANDROID_HOME/ndk-bundle"
	elif [ -d "$ANDROID_HOME/ndk" ]; then
	NDK=$(ls -1d "$ANDROID_HOME/ndk/"*/ 2>/dev/null \| sort -V \| tail -1)
	NDK="${NDK%/}"
	fi
	if [ -z "$NDK" ]; then
	echo "No Android NDK found on runner — android xcompile will skip"
	exit 0
	fi
	echo "ANDROID_NDK_HOME=$NDK" >> "$GITHUB_ENV"
	# 2. Point cc-rs + cargo at the NDK's clang wrapper so
	# `cargo build --target aarch64-linux-android` picks up the
	# right linker/CC/AR instead of the host `cc`.
	HOST_TAG=$(uname -s \| tr '[:upper:]' '[:lower:]')-x86_64
	# macOS NDK uses darwin-x86_64 even on arm64 runners (rosetta).
	if [ "$(uname -s)" = "Darwin" ]; then HOST_TAG="darwin-x86_64"; fi
	TOOLCHAIN="$NDK/toolchains/llvm/prebuilt/$HOST_TAG/bin"
	API=24
	CLANG=$(ls "$TOOLCHAIN"/aarch64-linux-android*-clang 2>/dev/null \| sort -V \| tail -1)
	if [ -z "$CLANG" ]; then
	echo "Could not locate NDK clang under $TOOLCHAIN — skipping env setup"
	exit 0
	fi
	echo "CC_aarch64_linux_android=$CLANG" >> "$GITHUB_ENV"
	echo "AR_aarch64_linux_android=$TOOLCHAIN/llvm-ar" >> "$GITHUB_ENV"
	echo "CARGO_TARGET_AARCH64_LINUX_ANDROID_LINKER=$CLANG" >> "$GITHUB_ENV"
	echo "Android NDK wired: CLANG=$CLANG"

	- name: Install Apple SDK Rust targets (macOS only)
	if: matrix.os == 'macos-14'
	run: \|
	rustup target add aarch64-apple-ios-sim
	# tvOS-sim is Rust Tier-3 — perry auto-rebuilds with
	# `+nightly -Zbuild-std`, which requires the rust-src
	# component on the nightly toolchain.
	rustup toolchain install nightly --component rust-src --profile minimal \|\| true

	- name: Install Android Rust target (macOS + Ubuntu)
	if: matrix.os == 'macos-14' \|\| matrix.os == 'ubuntu-24.04'
	run: rustup target add aarch64-linux-android

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Build compiler + UI backend + harness
	run: cargo build --release -p perry -p perry-runtime -p perry-stdlib -p ${{ matrix.ui_backend }} -p perry-doc-tests

	- name: Pre-build Apple UI libs for cross-compile (macOS only)
	if: matrix.os == 'macos-14'
	run: \|
	# iOS-sim is Rust Tier-2 — builds with stable. tvOS-sim is Tier-3
	# and needs nightly + -Zbuild-std; perry's auto-optimize handles
	# that path itself, so we skip pre-building perry-ui-tvos here.
	cargo build --release -p perry-ui-ios --target aarch64-apple-ios-sim

	- name: Lint docs/src markdown fences (repo-wide)
	if: matrix.os == 'macos-14'
	run: cargo run --release --quiet -p perry-doc-tests -- --lint docs/src

	- name: Run non-gallery doc-example tests (blocking)
	run: ${{ matrix.cmd_exclude_gallery }}

	- name: Run gallery screenshot diff
	id: gallery
	continue-on-error: ${{ matrix.gallery_advisory }}
	run: ${{ matrix.cmd_gallery }}

	- name: Cross-compile for web + wasm (blocking)
	id: xcompile_blocking
	run: ${{ matrix.cmd_xcompile_blocking }}

	- name: Cross-compile remaining targets (advisory)
	# iOS-sim/tvOS-sim/watchos-sim/android still surface real errors
	# that the harness logs but that aren't yet tracked issues. Keep
	# advisory until each target has a green baseline run.
	id: xcompile_advisory
	continue-on-error: true
	run: ${{ matrix.cmd_xcompile_advisory }}

	- name: Upload doc-tests report
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: doc-tests-report-${{ matrix.os }}
	path: docs/examples/_reports/latest.json

	- name: Upload gallery screenshot + diff artifacts
	if: always()
	uses: actions/upload-artifact@v7
	with:
	name: gallery-screenshots-${{ matrix.os }}
	path: \|
	target/perry-doc-tests/gallery_*.png
	docs/examples/_baselines/**/gallery.png

	# ---------------------------------------------------------------------------
	# Binary size tracking (main branch only)
	# ---------------------------------------------------------------------------
	binary-size:
	if: github.ref == 'refs/heads/main'
	runs-on: macos-14
	steps:
	- uses: actions/checkout@v6

	- name: Install Rust toolchain
	uses: dtolnay/rust-toolchain@stable

	- uses: Swatinem/rust-cache@v2
	with:
	shared-key: "${{ runner.os }}-perry"
	save-if: ${{ github.ref == 'refs/heads/main' }}

	- name: Build release binaries
	run: cargo build --release -p perry -p perry-runtime -p perry-stdlib

	- name: Report binary sizes
	run: \|
	echo "## Binary Sizes" > /tmp/sizes.md
	echo '```' >> /tmp/sizes.md
	ls -lh target/release/perry target/release/libperry_runtime.a target/release/libperry_stdlib.a 2>/dev/null \| awk '{print $5, $9}' >> /tmp/sizes.md
	echo '```' >> /tmp/sizes.md
	cat /tmp/sizes.md

	- name: Upload size report
	uses: actions/upload-artifact@v7
	with:
	name: binary-sizes
	path: /tmp/sizes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): defer runtime-unknown eval/new Function to a throw-on-reach error by default (#5206) #5838

Workflow file

feat(eval): defer runtime-unknown eval/new Function to a throw-on-reach error by default (#5206) #5838

Uh oh!

Workflow file for this run