Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions skills/testsprite-verify.codex.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ testsprite test run --all --project <id> [--filter <substr>] \
already have the change deployed (e.g. a CI preview deploy) — the CLI tests a
deployed URL, it doesn't host your environment. Running earlier verifies the
previous build.
- Backend `--code-file`: the runner executes the file top-to-bottom (not `pytest`), so **call your `test_*` function(s) at the end of the file** — a defined-but-uncalled test silently passes.
- Backend sandbox has only stdlib + `requests` + `pytest` + `numpy` + `scipy`. Test the API over HTTP with `requests`; do **not** `import` the project's own source modules or other packages (e.g. `torch`) — they aren't installed and the test won't run.
- `--wait` long-polls until terminal. Do not wrap it in a retry loop.
- Exit `0` = passed; `1` = failed/blocked; `7` = timeout (resume with `test wait <run-id>`).
- BE dependency flags (`--produces`/`--needs`/`--category`) are backend-only and
Expand Down
20 changes: 19 additions & 1 deletion skills/testsprite-verify.skill.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,10 @@ language; you don't write browser code.
**Backend — write the Python yourself and use `--code-file`.** There is no
server-side codegen on the CLI. Read the API surface that changed (OpenAPI, the
route handler, request/response shapes) and write a pytest-style assertion script
to a tempfile:
to a tempfile. **End the file by calling your `test_*` function(s)** — the runner
executes the file top-to-bottom and does NOT auto-discover/collect test functions
the way `pytest` does, so a test that is only _defined_ (never called) silently
passes regardless of its assertions:

```python
# /tmp/login-empty-password.py — runs against the project's target URL, creds injected.
Expand All @@ -145,8 +148,23 @@ def test_login_rejects_empty_password():
r = requests.post(f"{TARGET_URL}/login", json={"email": "a@b.c", "password": ""})
assert r.status_code == 400
assert r.json().get("error") == "invalid password"

# Required: actually invoke the test so its assertions run.
test_login_rejects_empty_password()
```

**Execution environment (backend).** The code runs in a locked-down sandbox with
only the Python **standard library + `requests` + `pytest` + `numpy` + `scipy`**
(plus `requests`' own deps like `urllib3`). So:

- **Test the API over HTTP** with `requests` against the target URL — that's what a
backend test verifies.
- **Do NOT `import` the project's own source modules** (e.g. `from app.services import …`,
`import core`, `import model`) or other third-party/ML packages (e.g. `torch`,
`pandas`, `django`). They are not installed, so the test fails to even run.
- Get values from the API's responses (and captured variables), not by importing and
calling the app's internals.

**Backend tests that share state declare dependencies at create time.** For a
one-off verification, prefer a single self-contained script (log in inside the
same file). But when the coverage set splits naturally into producer → consumer
Expand Down
Loading