Skip to content

Commit f2ff19d

Browse files
committed
ci: add macOS to test + e2e matrix
Two new layers of macOS CI coverage. macOS GH Actions runners are free for public repos and run in parallel with Linux/Windows, so wall-clock release time is unchanged. Phase A — macOS unit + plugin tests (Rust + bun) tests.yml: new `rust-macos` job release.yml: new `test-macos` job (gates the e2e suite) Until now, the build-darwin-* jobs only ran `cargo build --release`, never `cargo test`, so macOS-specific code paths (FSEvents, /var canonicalization, broken-symlink fallback, bash_background SIGTERM behavior, Apple Silicon codegen) had zero CI coverage. Tag-time release builds were the only place the macOS binary ever got produced, with no test signal at all. Phase B — macOS native e2e _e2e-suite.yml: new `e2e-macos` job tests/macos-e2e/run.sh: new host-setup script (mirrors Dockerfile.linux-x64) tests/docker/test-e2e.sh: refactored to be platform-aware via AFT_E2E_PLATFORM=linux|macos. The shared scenario logic now reads the broken-ONNX path (libonnxruntime.so vs libonnxruntime.dylib), mock-server location, and platform display from env so we don't fork the harness. Single source of truth: macOS uses the same scenario script as Linux Docker, just on a real macOS-arm64 host with native install of OpenCode + Bun + aimock. Catches FSEvents watcher behavior, /var vs /private/var canonicalization, broken-symlink-chain fallback, and .dylib loading paths for ONNX (probe order /opt/homebrew/lib then /usr/local/lib differs from Linux /usr/local/lib). Verified locally: - yaml.safe_load on all 3 workflow files: clean - actionlint .github/workflows/*.yml: clean - bash -n on tests/{docker,macos-e2e}/test-e2e.sh + run.sh: clean - cargo fmt --check: clean - cargo clippy --workspace --all-targets -- -D warnings: clean - bun test (opencode-plugin): 679 pass / 2 skip / 0 fail - bun test (pi-plugin): 380 pass / 0 fail
1 parent 0059789 commit f2ff19d

6 files changed

Lines changed: 444 additions & 31 deletions

File tree

.github/workflows/_e2e-suite.yml

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
name: E2E Suite
22

33
# Reusable workflow that runs the full E2E test matrix:
4-
# - Linux Docker e2e — OpenCode + plugin + binary stack against aimock
5-
# - Windows native e2e — same stack on real Windows runner (no containers)
4+
# - Linux Docker e2e — OpenCode + plugin + binary stack against aimock
5+
# - macOS native e2e — same stack on a real macOS-arm64 runner
6+
# - Windows native e2e — same stack on a real Windows runner (no containers)
67
#
78
# Called from:
89
# - tests.yml (PR + main push) — catches regressions at PR time
@@ -13,9 +14,10 @@ name: E2E Suite
1314
# IMPORTANT: any change to bridge transport, bash spawning, ONNX install,
1415
# locking, or platform-conditional code paths SHOULD touch the matching
1516
# integration test or e2e scenario here. The Linux harness has caught real
16-
# regressions before; the Windows e2e is here to extend that coverage to
17-
# issue-#26-class Windows-specific bugs (bash timeouts, lock recovery, path
18-
# separators).
17+
# regressions before; the Windows e2e covers issue-#26-class Windows-specific
18+
# bugs (bash timeouts, lock recovery, path separators); the macOS e2e covers
19+
# FSEvents watcher behavior, /var vs /private/var symlink canonicalization,
20+
# .dylib loading paths for ONNX, and Apple Silicon native codegen for aft.
1921

2022
on:
2123
workflow_call:
@@ -114,3 +116,45 @@ jobs:
114116
env:
115117
AFT_BINARY_PATH: ${{ github.workspace }}\target\x86_64-pc-windows-msvc\release\aft.exe
116118
AFT_PLUGIN_DIST: ${{ github.workspace }}\packages\opencode-plugin\dist
119+
120+
# ---------------------------------------------------------------------------
121+
# macOS native e2e — exercises the same OpenCode + plugin + binary stack on
122+
# a real macOS-arm64 runner. No containers (GH Actions macOS runners can't
123+
# run Docker reliably). Uses the same shared bash harness as Linux Docker
124+
# via tests/macos-e2e/run.sh, which sets up host deps natively and then
125+
# invokes tests/docker/test-e2e.sh with AFT_E2E_PLATFORM=macos.
126+
#
127+
# Catches: FSEvents watcher behavior, /var vs /private/var canonicalization,
128+
# broken-symlink-chain fallback, .dylib loading, and Apple Silicon codegen.
129+
# ---------------------------------------------------------------------------
130+
e2e-macos:
131+
name: E2E (macOS native)
132+
runs-on: macos-latest
133+
timeout-minutes: 30
134+
steps:
135+
- uses: actions/checkout@v4
136+
137+
- uses: dtolnay/rust-toolchain@stable
138+
139+
- uses: Swatinem/rust-cache@v2
140+
141+
- uses: oven-sh/setup-bun@v2
142+
with:
143+
bun-version: latest
144+
145+
- name: Build AFT binary (macOS native)
146+
run: cargo build --release -p agent-file-tools
147+
148+
- name: Install workspace deps + build aft-bridge
149+
run: |
150+
bun install --frozen-lockfile
151+
bun run --cwd packages/aft-bridge build
152+
153+
- name: Build OpenCode plugin dist
154+
run: bun run --cwd packages/opencode-plugin build
155+
156+
- name: Run macOS E2E suite
157+
run: bash tests/macos-e2e/run.sh
158+
env:
159+
AFT_BINARY_PATH: ${{ github.workspace }}/target/release/aft
160+
AFT_PLUGIN_DIST: ${{ github.workspace }}/packages/opencode-plugin/dist

.github/workflows/release.yml

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,55 @@ jobs:
6565
- name: Build all JS packages (gate publish)
6666
run: bun run --filter '*' build
6767

68-
# Run the full E2E matrix (Linux Docker + Windows native) at release time.
69-
# Reuses the same workflow tests.yml runs at PR time — single source of
70-
# truth, so PR-time and release-time e2e can never drift. Build/publish
71-
# jobs below `needs:` this job, so an e2e regression blocks the release.
68+
# macOS unit-test gate — mirrors the Linux `test` job's cargo + plugin test
69+
# coverage on a real macOS runner. The build-darwin-* jobs below only run
70+
# `cargo build --release`, not `cargo test`, so this is the only place where
71+
# macOS-specific code paths are actually exercised at release time:
72+
# - FSEvents watcher behavior (different coalescing latency from inotify)
73+
# - /var vs /private/var symlink canonicalization
74+
# - Broken-symlink-chain fallback in context.rs
75+
# - bash_background SIGTERM behavior on Apple Silicon
76+
# Lint and `bun run --filter '*' build` only run in the Linux `test` job —
77+
# they are platform-independent so duplicating them on macOS adds no signal.
78+
test-macos:
79+
name: Test (macOS)
80+
runs-on: macos-latest
81+
steps:
82+
- uses: actions/checkout@v4
83+
84+
- uses: dtolnay/rust-toolchain@stable
85+
86+
- uses: Swatinem/rust-cache@v2
87+
88+
- uses: oven-sh/setup-bun@v2
89+
with:
90+
bun-version: latest
91+
92+
- name: Rust tests
93+
run: cargo test --workspace
94+
95+
- name: Install JS dependencies
96+
run: bun install
97+
98+
- name: Build aft-bridge (consumers need its dist/ for typecheck)
99+
run: bun run --cwd packages/aft-bridge build
100+
101+
- name: TypeScript typecheck
102+
run: bun run typecheck
103+
104+
- name: Plugin tests
105+
run: bun run test
106+
env:
107+
AFT_CACHE_DIR: ${{ runner.temp }}/aft-cache
108+
109+
# Run the full E2E matrix (Linux Docker + Windows native + macOS native) at
110+
# release time. Reuses the same workflow tests.yml runs at PR time — single
111+
# source of truth, so PR-time and release-time e2e can never drift.
112+
# Build/publish jobs below `needs:` this job, so an e2e regression blocks
113+
# the release.
72114
e2e:
73115
name: E2E
74-
needs: [test]
116+
needs: [test, test-macos]
75117
uses: ./.github/workflows/_e2e-suite.yml
76118

77119
publish-crates:

.github/workflows/tests.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,50 @@ jobs:
9191
# ~/.cache to avoid cross-test pollution.
9292
AFT_CACHE_DIR: ${{ runner.temp }}/aft-cache
9393

94+
# ---------------------------------------------------------------------------
95+
# Rust + plugin tests on macOS
96+
# Catches macOS-specific code paths: FSEvents watcher behavior (different
97+
# coalescing latency from inotify), /var vs /private/var symlink
98+
# canonicalization, broken-symlink-chain fallback, bash_background SIGTERM
99+
# behavior, and Apple Silicon-specific Rust compilation. The build-darwin-*
100+
# jobs in release.yml only run `cargo build` — this is the only place we
101+
# actually execute tests on macOS in CI.
102+
# ---------------------------------------------------------------------------
103+
rust-macos:
104+
name: Unit tests (macOS)
105+
runs-on: macos-latest
106+
timeout-minutes: 25
107+
steps:
108+
- uses: actions/checkout@v4
109+
110+
- uses: dtolnay/rust-toolchain@stable
111+
112+
- uses: Swatinem/rust-cache@v2
113+
114+
- uses: oven-sh/setup-bun@v2
115+
with:
116+
bun-version: latest
117+
118+
- name: Install workspace deps
119+
run: bun install --frozen-lockfile
120+
121+
- name: Build aft-bridge dist (workspace consumers depend on it)
122+
run: bun run --cwd packages/aft-bridge build
123+
124+
- name: Cargo build (debug — needed by plugin e2e tests)
125+
run: cargo build -p agent-file-tools
126+
127+
- name: Cargo test
128+
run: cargo test --workspace
129+
130+
- name: Bun typecheck
131+
run: bun run typecheck
132+
133+
- name: Bun test (all packages)
134+
run: bun run test
135+
env:
136+
AFT_CACHE_DIR: ${{ runner.temp }}/aft-cache
137+
94138
# ---------------------------------------------------------------------------
95139
# Rust integration tests on Windows
96140
# Catches platform-conditional code paths (#[cfg(target_os = "windows")])

tests/docker/test-e2e.sh

Lines changed: 74 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
#!/usr/bin/env bash
22
# ------------------------------------------------------------------
3-
# E2E test: AFT plugin running inside OpenCode on Linux x64
3+
# E2E test: AFT plugin running inside OpenCode.
4+
#
5+
# Used by both:
6+
# - tests/docker/Dockerfile.linux-x64 (Linux Docker E2E in CI)
7+
# - tests/macos-e2e/run.sh (macOS native E2E in CI)
48
#
59
# Uses aimock for deterministic OpenAI-compatible mock LLM.
610
# Simulates a realistic multi-turn agent session that exercises:
@@ -11,6 +15,14 @@
1115
#
1216
# Each scenario runs a full OpenCode session with 8 tool call turns,
1317
# giving background threads enough time to build indices.
18+
#
19+
# Platform-specific behavior is controlled by the AFT_E2E_PLATFORM env
20+
# var (defaults to "linux"):
21+
# AFT_E2E_PLATFORM=linux → fake libonnxruntime.so in /usr/local/lib
22+
# AFT_E2E_PLATFORM=macos → fake libonnxruntime.dylib in /tmp
23+
# Each platform's runner script is responsible for installing OpenCode,
24+
# Bun, aimock, writing configs, and placing the AFT binary + plugin
25+
# before invoking this script.
1426
# ------------------------------------------------------------------
1527

1628
set -euo pipefail
@@ -22,7 +34,28 @@ NC='\033[0m'
2234

2335
PASS=0
2436
FAIL=0
25-
PLUGIN_LOG="/tmp/aft-plugin.log"
37+
PLUGIN_LOG="${AFT_E2E_PLUGIN_LOG:-/tmp/aft-plugin.log}"
38+
PLATFORM="${AFT_E2E_PLATFORM:-linux}"
39+
40+
# Platform-specific paths for the broken-ONNX scenario.
41+
case "$PLATFORM" in
42+
linux)
43+
FAKE_ORT_PATH="/usr/local/lib/libonnxruntime.so"
44+
PLATFORM_DISPLAY="Linux x64 (Debian)"
45+
;;
46+
macos)
47+
# /tmp on macOS is a symlink to /private/tmp; we use a path the
48+
# plugin can find via DYLD_LIBRARY_PATH or the AFT-managed cache.
49+
# We don't drop into /usr/local/lib because SIP-protected paths
50+
# need root, and macOS GH Actions runners don't grant it.
51+
FAKE_ORT_PATH="${RUNNER_TEMP:-/tmp}/libonnxruntime.dylib"
52+
PLATFORM_DISPLAY="macOS native"
53+
;;
54+
*)
55+
echo "Unknown AFT_E2E_PLATFORM: $PLATFORM (expected linux|macos)" >&2
56+
exit 2
57+
;;
58+
esac
2659

2760
check() {
2861
local label="$1"
@@ -49,8 +82,13 @@ warn_check() {
4982
fi
5083
}
5184

85+
# AFT_E2E_MOCK_SERVER points to the mock-server.js entry. The Docker setup
86+
# places it at /test/mock-server.js; the macOS runner places it relative to
87+
# the repo checkout. Both wire this through env so we don't hardcode paths.
88+
MOCK_SERVER="${AFT_E2E_MOCK_SERVER:-/test/mock-server.js}"
89+
5290
start_aimock() {
53-
node /test/mock-server.js > /tmp/aimock.log 2>&1 &
91+
node "$MOCK_SERVER" > /tmp/aimock.log 2>&1 &
5492
AIMOCK_PID=$!
5593
for i in $(seq 1 15); do
5694
if curl -s http://127.0.0.1:4010/v1/models > /dev/null 2>&1; then
@@ -92,7 +130,7 @@ run_opencode_session() {
92130
}
93131

94132
echo "════════════════════════════════════════"
95-
echo " AFT E2E Test — Linux x64 (Debian)"
133+
echo " AFT E2E Test — $PLATFORM_DISPLAY"
96134
echo "════════════════════════════════════════"
97135
echo ""
98136

@@ -181,45 +219,60 @@ cat /tmp/aimock.log 2>/dev/null | sed 's/^/ /' || echo " (empty)"
181219
stop_aimock
182220

183221
# ══════════════════════════════════════════════════════════════════
184-
# Scenario 2: Broken libonnxruntime.so (reproduces issue #4)
185-
# A fake .so in /usr/local/lib that the plugin detects and sets
186-
# as ORT_DYLIB_PATH — the binary should NOT crash when loading it.
222+
# Scenario 2: Broken ONNX Runtime library (reproduces issue #4)
223+
# A fake shared library that the plugin detects and sets as
224+
# ORT_DYLIB_PATH — the binary should NOT crash when loading it.
225+
# Library suffix and path differ per platform; FAKE_ORT_PATH was
226+
# resolved at the top of the script.
187227
# ══════════════════════════════════════════════════════════════════
188228

189229
echo ""
190-
echo "── Scenario 2: Broken libonnxruntime.so (issue #4) ──"
230+
echo "── Scenario 2: Broken ONNX Runtime library (issue #4) ──"
191231
echo ""
192232

193-
# Install fake broken .so
194-
echo "not a real shared library" > /usr/local/lib/libonnxruntime.so
195-
chmod 755 /usr/local/lib/libonnxruntime.so
196-
echo " Installed fake libonnxruntime.so in /usr/local/lib"
233+
# Install fake broken library at $FAKE_ORT_PATH (linux: /usr/local/lib/libonnxruntime.so,
234+
# macos: $RUNNER_TEMP/libonnxruntime.dylib).
235+
echo "not a real shared library" > "$FAKE_ORT_PATH"
236+
chmod 755 "$FAKE_ORT_PATH"
237+
echo " Installed fake $(basename "$FAKE_ORT_PATH") at $FAKE_ORT_PATH"
197238

198239
rm -f "$PLUGIN_LOG"
199240

200241
start_aimock
201242
check "aimock started (s2)" "curl -s http://127.0.0.1:4010/v1/models > /dev/null 2>&1"
202243

203-
echo "Running session with broken .so..."
244+
echo "Running session with broken library..."
204245
RESULT_FILE="/tmp/result-scenario2.txt"
205-
run_opencode_session \
206-
"Read the file src/main.py and then grep for all function definitions." \
207-
"$RESULT_FILE" \
246+
# On macOS, the AFT plugin probes a fixed list of system paths
247+
# (/usr/local/lib, /opt/homebrew/lib) for libonnxruntime.dylib. Since
248+
# we cannot write into /usr/local/lib on a vanilla GH Actions runner
249+
# without sudo, we point ORT_DYLIB_PATH directly at our fake instead.
250+
# Linux scenario keeps the implicit /usr/local/lib detection path.
251+
if [ "$PLATFORM" = "macos" ]; then
252+
ORT_DYLIB_PATH="$FAKE_ORT_PATH" \
253+
run_opencode_session \
254+
"Read the file src/main.py and then grep for all function definitions." \
255+
"$RESULT_FILE"
256+
else
257+
run_opencode_session \
258+
"Read the file src/main.py and then grep for all function definitions." \
259+
"$RESULT_FILE"
260+
fi
208261

209262
EXIT_CODE=$?
210263

211-
check "session completed (broken .so)" "[ $EXIT_CODE -eq 0 ] || [ $EXIT_CODE -eq 124 ]"
212-
warn_check "no crash (broken .so)" "! grep -qi 'Binary crashed\|SIGABRT\|panicked' '$RESULT_FILE' 2>/dev/null"
213-
check "no plugin crash (broken .so)" "! grep -qi 'SIGABRT\|thread.*panicked' '$PLUGIN_LOG' 2>/dev/null"
264+
check "session completed (broken lib)" "[ $EXIT_CODE -eq 0 ] || [ $EXIT_CODE -eq 124 ]"
265+
warn_check "no crash (broken lib)" "! grep -qi 'Binary crashed\|SIGABRT\|panicked' '$RESULT_FILE' 2>/dev/null"
266+
check "no plugin crash (broken lib)" "! grep -qi 'SIGABRT\|thread.*panicked' '$PLUGIN_LOG' 2>/dev/null"
214267

215-
# Verify the plugin detected the system .so
268+
# Verify the plugin detected the system library
216269
warn_check "system ORT detected" "grep -q 'ONNX Runtime found at system path\|ORT_DYLIB_PATH' '$PLUGIN_LOG' 2>/dev/null"
217270

218271
echo ""
219272
echo " Plugin log (last 30 lines):"
220273
tail -30 "$PLUGIN_LOG" 2>/dev/null | sed 's/^/ /' || echo " (empty)"
221274

222-
rm -f /usr/local/lib/libonnxruntime.so
275+
rm -f "$FAKE_ORT_PATH"
223276
stop_aimock
224277

225278
# ══════════════════════════════════════════════════════════════════

tests/macos-e2e/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# macOS native E2E
2+
3+
End-to-end test for the AFT plugin running inside OpenCode on a real macOS
4+
host (no containers, no QEMU). Runs as `e2e-macos` in the reusable
5+
`.github/workflows/_e2e-suite.yml` workflow, alongside the Linux Docker and
6+
Windows native E2E jobs.
7+
8+
## What this catches
9+
10+
- FSEvents watcher behavior (different coalescing latency from Linux inotify;
11+
the v0.19.5 release fixed two flaky watcher tests with exactly this shape).
12+
- `/var` vs `/private/var` symlink canonicalization in `context.rs`.
13+
- Broken-symlink-chain fallback in `context.rs`.
14+
- macOS dylib loading (`.dylib` extension, `/usr/local/lib` and
15+
`/opt/homebrew/lib` probe paths) for ONNX Runtime — distinct from the Linux
16+
`.so` / `/usr/local/lib` path.
17+
- Apple Silicon native ARM64 codegen for the `aft` binary itself.
18+
19+
## How it works
20+
21+
`run.sh` performs the host setup that `Dockerfile.linux-x64` does on Linux
22+
(install OpenCode, Bun, aimock, write configs, place locally-built AFT binary
23+
+ plugin dist), then invokes the shared `tests/docker/test-e2e.sh` harness
24+
with `AFT_E2E_PLATFORM=macos`. The shared harness reads platform-specific
25+
paths (e.g. `libonnxruntime.dylib` vs `libonnxruntime.so`) from env so we
26+
don't fork the scenario logic.
27+
28+
## Local invocation
29+
30+
The harness expects `AFT_BINARY_PATH` and `AFT_PLUGIN_DIST` to point at the
31+
locally-built AFT binary and OpenCode plugin dist:
32+
33+
```bash
34+
cargo build --release -p agent-file-tools
35+
bun run --cwd packages/aft-bridge build
36+
bun run --cwd packages/opencode-plugin build
37+
38+
AFT_BINARY_PATH="$PWD/target/release/aft" \
39+
AFT_PLUGIN_DIST="$PWD/packages/opencode-plugin/dist" \
40+
bash tests/macos-e2e/run.sh
41+
```
42+
43+
The script writes its OpenCode config under `$RUNNER_TEMP/aft-e2e-xdg/opencode/`
44+
(or `$TMPDIR/aft-e2e-xdg/opencode/` outside CI) and the test project under
45+
`$RUNNER_TEMP/aft-e2e-project/`, leaving your real `~/.config/opencode/`
46+
untouched.

0 commit comments

Comments
 (0)