diff --git a/skills/testsprite-verify.codex.md b/skills/testsprite-verify.codex.md index c641526..f2c2456 100644 --- a/skills/testsprite-verify.codex.md +++ b/skills/testsprite-verify.codex.md @@ -64,6 +64,8 @@ testsprite test run --all --project [--filter ] \ already have the change deployed (e.g. a CI preview deploy) — the CLI tests a deployed URL, it doesn't host your environment. Running earlier verifies the previous build. +- Backend `--code-file`: the runner executes the file top-to-bottom (not `pytest`), so **call your `test_*` function(s) at the end of the file** — a defined-but-uncalled test silently passes. +- Backend sandbox has only stdlib + `requests` + `pytest` + `numpy` + `scipy`. Test the API over HTTP with `requests`; do **not** `import` the project's own source modules or other packages (e.g. `torch`) — they aren't installed and the test won't run. - `--wait` long-polls until terminal. Do not wrap it in a retry loop. - Exit `0` = passed; `1` = failed/blocked; `7` = timeout (resume with `test wait `). - BE dependency flags (`--produces`/`--needs`/`--category`) are backend-only and diff --git a/skills/testsprite-verify.skill.md b/skills/testsprite-verify.skill.md index 34a7dcc..650a88b 100644 --- a/skills/testsprite-verify.skill.md +++ b/skills/testsprite-verify.skill.md @@ -135,7 +135,10 @@ language; you don't write browser code. **Backend — write the Python yourself and use `--code-file`.** There is no server-side codegen on the CLI. Read the API surface that changed (OpenAPI, the route handler, request/response shapes) and write a pytest-style assertion script -to a tempfile: +to a tempfile. **End the file by calling your `test_*` function(s)** — the runner +executes the file top-to-bottom and does NOT auto-discover/collect test functions +the way `pytest` does, so a test that is only _defined_ (never called) silently +passes regardless of its assertions: ```python # /tmp/login-empty-password.py — runs against the project's target URL, creds injected. @@ -145,8 +148,23 @@ def test_login_rejects_empty_password(): r = requests.post(f"{TARGET_URL}/login", json={"email": "a@b.c", "password": ""}) assert r.status_code == 400 assert r.json().get("error") == "invalid password" + +# Required: actually invoke the test so its assertions run. +test_login_rejects_empty_password() ``` +**Execution environment (backend).** The code runs in a locked-down sandbox with +only the Python **standard library + `requests` + `pytest` + `numpy` + `scipy`** +(plus `requests`' own deps like `urllib3`). So: + +- **Test the API over HTTP** with `requests` against the target URL — that's what a + backend test verifies. +- **Do NOT `import` the project's own source modules** (e.g. `from app.services import …`, + `import core`, `import model`) or other third-party/ML packages (e.g. `torch`, + `pandas`, `django`). They are not installed, so the test fails to even run. +- Get values from the API's responses (and captured variables), not by importing and + calling the app's internals. + **Backend tests that share state declare dependencies at create time.** For a one-off verification, prefer a single self-contained script (log in inside the same file). But when the coverage set splits naturally into producer → consumer