From 66793ab498d26aa824a4d731867069ea4aea7cd6 Mon Sep 17 00:00:00 2001
From: Fangyuan_Zhang <zhang.fangyua@northeastern.edu>
Date: Mon, 22 Jun 2026 19:41:22 -0700
Subject: [PATCH 1/2] =?UTF-8?q?docs(skill):=20backend=20test=20authoring?=
 =?UTF-8?q?=20rules=20=E2=80=94=20call=20the=20test=20fn=20+=20sandbox=20l?=
 =?UTF-8?q?imits?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two authoring gaps surfaced while debugging CLI backend tests that silently passed:

1. The runner executes --code-file top-to-bottom (not pytest), so a defined-but-
   uncalled test_* function never runs its assertions. Tell agents to call their
   test function(s) at the end of the file.
2. The execution sandbox only has stdlib + requests + pytest + numpy + scipy.
   Tests that import the project's own source modules or other packages (torch,
   pandas, local app modules) fail to run. Tell agents to test the API over HTTP
   with requests and not import app internals / unavailable packages.

Applies to testsprite-verify.skill.md (full note) + testsprite-verify.codex.md (bullets).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 skills/testsprite-verify.codex.md |  2 ++
 skills/testsprite-verify.skill.md | 19 ++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/skills/testsprite-verify.codex.md b/skills/testsprite-verify.codex.md
index c641526..f2c2456 100644
--- a/skills/testsprite-verify.codex.md
+++ b/skills/testsprite-verify.codex.md
@@ -64,6 +64,8 @@ testsprite test run --all --project <id> [--filter <substr>] \
   already have the change deployed (e.g. a CI preview deploy) — the CLI tests a
   deployed URL, it doesn't host your environment. Running earlier verifies the
   previous build.
+- Backend `--code-file`: the runner executes the file top-to-bottom (not `pytest`), so **call your `test_*` function(s) at the end of the file** — a defined-but-uncalled test silently passes.
+- Backend sandbox has only stdlib + `requests` + `pytest` + `numpy` + `scipy`. Test the API over HTTP with `requests`; do **not** `import` the project's own source modules or other packages (e.g. `torch`) — they aren't installed and the test won't run.
 - `--wait` long-polls until terminal. Do not wrap it in a retry loop.
 - Exit `0` = passed; `1` = failed/blocked; `7` = timeout (resume with `test wait <run-id>`).
 - BE dependency flags (`--produces`/`--needs`/`--category`) are backend-only and
diff --git a/skills/testsprite-verify.skill.md b/skills/testsprite-verify.skill.md
index 34a7dcc..ee1d681 100644
--- a/skills/testsprite-verify.skill.md
+++ b/skills/testsprite-verify.skill.md
@@ -135,7 +135,10 @@ language; you don't write browser code.
 **Backend — write the Python yourself and use `--code-file`.** There is no
 server-side codegen on the CLI. Read the API surface that changed (OpenAPI, the
 route handler, request/response shapes) and write a pytest-style assertion script
-to a tempfile:
+to a tempfile. **End the file by calling your `test_*` function(s)** — the runner
+executes the file top-to-bottom and does NOT auto-discover/collect test functions
+the way `pytest` does, so a test that is only *defined* (never called) silently
+passes regardless of its assertions:
 
 ```python
 # /tmp/login-empty-password.py — runs against the project's target URL, creds injected.
@@ -145,8 +148,22 @@ def test_login_rejects_empty_password():
     r = requests.post(f"{TARGET_URL}/login", json={"email": "a@b.c", "password": ""})
     assert r.status_code == 400
     assert r.json().get("error") == "invalid password"
+
+# Required: actually invoke the test so its assertions run.
+test_login_rejects_empty_password()
 ```
 
+**Execution environment (backend).** The code runs in a locked-down sandbox with
+only the Python **standard library + `requests` + `pytest` + `numpy` + `scipy`**
+(plus `requests`' own deps like `urllib3`). So:
+- **Test the API over HTTP** with `requests` against the target URL — that's what a
+  backend test verifies.
+- **Do NOT `import` the project's own source modules** (e.g. `from app.services import …`,
+  `import core`, `import model`) or other third-party/ML packages (e.g. `torch`,
+  `pandas`, `django`). They are not installed, so the test fails to even run.
+- Get values from the API's responses (and captured variables), not by importing and
+  calling the app's internals.
+
 **Backend tests that share state declare dependencies at create time.** For a
 one-off verification, prefer a single self-contained script (log in inside the
 same file). But when the coverage set splits naturally into producer → consumer

From 2797318afe1afbbbf7e46dfc1bb90fb2eb1dab57 Mon Sep 17 00:00:00 2001
From: Fangyuan_Zhang <zhang.fangyua@northeastern.edu>
Date: Mon, 22 Jun 2026 19:45:14 -0700
Subject: [PATCH 2/2] style: prettier-format testsprite-verify.skill.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 skills/testsprite-verify.skill.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/skills/testsprite-verify.skill.md b/skills/testsprite-verify.skill.md
index ee1d681..650a88b 100644
--- a/skills/testsprite-verify.skill.md
+++ b/skills/testsprite-verify.skill.md
@@ -137,7 +137,7 @@ server-side codegen on the CLI. Read the API surface that changed (OpenAPI, the
 route handler, request/response shapes) and write a pytest-style assertion script
 to a tempfile. **End the file by calling your `test_*` function(s)** — the runner
 executes the file top-to-bottom and does NOT auto-discover/collect test functions
-the way `pytest` does, so a test that is only *defined* (never called) silently
+the way `pytest` does, so a test that is only _defined_ (never called) silently
 passes regardless of its assertions:
 
 ```python
@@ -156,6 +156,7 @@ test_login_rejects_empty_password()
 **Execution environment (backend).** The code runs in a locked-down sandbox with
 only the Python **standard library + `requests` + `pytest` + `numpy` + `scipy`**
 (plus `requests`' own deps like `urllib3`). So:
+
 - **Test the API over HTTP** with `requests` against the target URL — that's what a
   backend test verifies.
 - **Do NOT `import` the project's own source modules** (e.g. `from app.services import …`,