diff --git a/skills/testsprite-verify.codex.md b/skills/testsprite-verify.codex.md
index c641526..f2c2456 100644
--- a/skills/testsprite-verify.codex.md
+++ b/skills/testsprite-verify.codex.md
@@ -64,6 +64,8 @@ testsprite test run --all --project <id> [--filter <substr>] \
   already have the change deployed (e.g. a CI preview deploy) — the CLI tests a
   deployed URL, it doesn't host your environment. Running earlier verifies the
   previous build.
+- Backend `--code-file`: the runner executes the file top-to-bottom (not `pytest`), so **call your `test_*` function(s) at the end of the file** — a defined-but-uncalled test silently passes.
+- Backend sandbox has only stdlib + `requests` + `pytest` + `numpy` + `scipy`. Test the API over HTTP with `requests`; do **not** `import` the project's own source modules or other packages (e.g. `torch`) — they aren't installed and the test won't run.
 - `--wait` long-polls until terminal. Do not wrap it in a retry loop.
 - Exit `0` = passed; `1` = failed/blocked; `7` = timeout (resume with `test wait <run-id>`).
 - BE dependency flags (`--produces`/`--needs`/`--category`) are backend-only and
diff --git a/skills/testsprite-verify.skill.md b/skills/testsprite-verify.skill.md
index 34a7dcc..650a88b 100644
--- a/skills/testsprite-verify.skill.md
+++ b/skills/testsprite-verify.skill.md
@@ -135,7 +135,10 @@ language; you don't write browser code.
 **Backend — write the Python yourself and use `--code-file`.** There is no
 server-side codegen on the CLI. Read the API surface that changed (OpenAPI, the
 route handler, request/response shapes) and write a pytest-style assertion script
-to a tempfile:
+to a tempfile. **End the file by calling your `test_*` function(s)** — the runner
+executes the file top-to-bottom and does NOT auto-discover/collect test functions
+the way `pytest` does, so a test that is only _defined_ (never called) silently
+passes regardless of its assertions:
 
 ```python
 # /tmp/login-empty-password.py — runs against the project's target URL, creds injected.
@@ -145,8 +148,23 @@ def test_login_rejects_empty_password():
     r = requests.post(f"{TARGET_URL}/login", json={"email": "a@b.c", "password": ""})
     assert r.status_code == 400
     assert r.json().get("error") == "invalid password"
+
+# Required: actually invoke the test so its assertions run.
+test_login_rejects_empty_password()
 ```
 
+**Execution environment (backend).** The code runs in a locked-down sandbox with
+only the Python **standard library + `requests` + `pytest` + `numpy` + `scipy`**
+(plus `requests`' own deps like `urllib3`). So:
+
+- **Test the API over HTTP** with `requests` against the target URL — that's what a
+  backend test verifies.
+- **Do NOT `import` the project's own source modules** (e.g. `from app.services import …`,
+  `import core`, `import model`) or other third-party/ML packages (e.g. `torch`,
+  `pandas`, `django`). They are not installed, so the test fails to even run.
+- Get values from the API's responses (and captured variables), not by importing and
+  calling the app's internals.
+
 **Backend tests that share state declare dependencies at create time.** For a
 one-off verification, prefer a single self-contained script (log in inside the
 same file). But when the coverage set splits naturally into producer → consumer