RW-834 feat(script-vars): runtime-overridable script variables for commit_slx and run_script_and_wait#9
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ld checks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erged into envVars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
A few review comments. The first item is the only blocker for CI; the rest are correctness/cleanup suggestions.
0. CI is failing — tests/test_validation.py lint fix
Both lint-and-test (3.10) and lint-and-test (3.12) jobs fail with:
I001 [*] Import block is un-sorted or un-formatted
--> tests/test_validation.py:367:9
|
365 | @mock.patch("runwhen_platform_mcp.server._resolve_workspace", new_callable=mock.AsyncMock)
366 | def test_invalid_script_var_returns_error(self, mock_resolve) -> None:
367 | / import json
368 | | from runwhen_platform_mcp.server import commit_slx # noqa: PLC0415
Fix: move import json and from runwhen_platform_mcp.server import commit_slx to the file's top-level import block (alongside the existing from runwhen_platform_mcp.server import (...) in tests/test_validation.py), then run:
ruff check --fix . && ruff format .The lazy-import pattern with # noqa: PLC0415 reads as a workaround for an import cycle that doesn't actually exist here — commit_slx is already imported the normal way elsewhere in the suite. Hoisting it removes the noqa and silences I001 in one move.
1. SLI handling is silently broken
commit_slx accepts script_vars regardless of task_type, runs _validate_script_vars, and then quietly drops the value when task_type="sli" (only _build_runbook_yaml receives them; _build_sli_yaml and _build_cron_sli_yaml do not). The SKILL.md says "NEVER pass script_vars when task_type='sli'" but the code accepts and ignores. Agents will pass them, get a successful commit response, and never realise their schema was dropped.
Reject early in commit_slx:
if script_vars and task_type != "task":
return _json_response({
"error": "script_vars are only valid for task_type='task'. "
"SLIs are automated probes with fixed thresholds and have no "
"per-run override concept.",
})Cron-scheduler SLI mode (where task_type="task" but a cron_schedule is also given) is a related case — _build_cron_sli_yaml won't write script vars while _build_runbook_yaml will. That asymmetry is defensible (the runbook has them, the cron-driven SLI doesn't override anything per run) but worth a one-line code comment so future maintainers don't read it as a bug.
2. Missing collision checks
Three name-collision cases can produce silently broken YAML today:
a. env_vars ∩ script_vars. If LOG_QUERY is in both, the runbook gets LOG_QUERY in both configProvided AND scriptVarsProvided. This will happen frequently because _extract_env_vars auto-discovers os.environ.get("LOG_QUERY") from the script source — agents will then declare the same name in script_vars and end up with the dup. Reject:
if script_vars and env_vars:
overlap = {sv["name"] for sv in script_vars} & set(env_vars)
if overlap:
return _json_response({
"error": f"Names appear in both env_vars and script_vars: {sorted(overlap)}. "
"A name must be one or the other.",
})b. script_vars ∩ secret_vars. Lower frequency but worth checking by symmetry — secrets and script vars share the env-var namespace on the runner.
c. Duplicates within script_vars itself. _validate_script_vars doesn't reject [{"name": "FOO", ...}, {"name": "FOO", ...}]. Add a seen-set check inside the existing loop:
seen: set[str] = set()
for i, var in enumerate(script_vars):
name = var.get("name")
if name and name in seen:
errors.append(f"script_vars[{i}]: duplicate name {name!r}")
if name:
seen.add(name)3. docs/plans/ accumulation
This PR adds 818 lines of plan/design docs (2026-04-28-script-vars-design.md 110, 2026-04-28-script-vars-plan.md 708). Once merged these become stale "here's what we did" artefacts that searches will surface as if they're current spec. Two options:
- Trim before merge: keep the short design doc (it captures intent), drop the 708-line implementation plan, or condense both into a single one-page post-implementation summary.
- Follow-up cleanup: open a tracking issue to move dated
docs/plans/*-plan.mdfiles underdocs/plans/archive/after merge, and apply the same convention here.
Either's fine, but a 700-line implementation plan landing in the main tree is the kind of thing that compounds. Worth setting the convention now while the repo is small. (The 468-platform UI PR has the same anti-pattern with 4 plan docs — establishing it here gives that team a pattern to follow.)
4. Naming nit — flagging it for further internal discussion
script_vars vs env_vars reads as two flavours of the same thing to an LLM agent: both end up as env vars on the runner, both are lists of name/value-ish pairs, only the s differentiates them at the tool boundary. The classification ladder in SKILL.md helps, but the param name itself doesn't carry the signal. Worth revisiting (task_parameters, task_inputs, runtime_inputs) at some point — but only in coordination with the UI side; diverging unilaterally from the platform's scriptVarsProvided CRD field and the UI's scriptVars types creates more confusion than it removes.
For this PR: at minimum, beef up the Field(description=...) on script_vars to explicitly contrast against env_vars and secret_vars, so the decision rule lives at the tool boundary, not only in the skill doc:
Per-run task parameters that the END USER fills in when invoking the committed task (log queries, time windows, filters). Distinct from
env_vars(set once by the task author — cluster, namespace, context) andsecret_vars(credentials). Task-only — never valid for SLIs.
…review comments - Rename all occurrences: script_vars → run_time_vars, script_var_overrides → run_time_var_overrides, scriptVarsProvided → runTimeVarsProvided (capital T) - Fix CI blocker: hoist inline imports (json, commit_slx, run_script_and_wait, asyncio) to top-level import block in test_validation.py, removing noqa workaround - Reject run_time_vars early when task_type != 'task' (SLI guard) - Add collision checks: env_vars ∩ run_time_vars and secret_vars ∩ run_time_vars - Add duplicate-name check inside _validate_run_time_vars (seen-set) - Beef up Field(description=...) for run_time_vars to contrast env_vars/secret_vars - Archive 708-line implementation plan to docs/plans/archive/ - Add 5 new tests: SLI rejection, env_vars collision, secret_vars collision, duplicate name flagged, duplicate name ok (231 tests total, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
All five items addressed in the latest commit ( #0 — CI fix: Hoisted #1 — SLI guard: if run_time_vars and task_type != "task":
return _json_response({"error": "run_time_vars are only valid for task_type='task'. ..."})#2 — Collision checks:
All three backed by new tests (231 total, all pass). #3 — Docs: 708-line plan moved to #4 — Naming: Renamed to |
Replace the 4-step runbook/runs flow with the runsessions endpoint:
- POST /api/v3/workspaces/{ws}/runsessions with source=direct and run_requests
payload; run requests are triggered automatically by the endpoint
- Poll GET /api/v3/workspaces/{ws}/runsessions/{id} until all run requests have
response_time set
- Results (issues, passed/failed/skipped titles) are embedded in run_requests[]
in the session response — no separate /output call needed
Add optional run_time_var_overrides param (dict[str, str]) passed through to
the run request payload; staff access gate is enforced by papi.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The papi API expects camelCase in JSON: runRequests, slxName, taskTitles, runTimeVarOverrides. Also convert the taskTitles string param to a list as the API expects (split on '||', or ["*"] for the wildcard default). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Align with agentfarm convention (single word). All identifiers, YAML keys, and prose updated: run_time_vars → runtime_vars run_time_var_overrides → runtime_var_overrides _validate_run_time_vars → _validate_runtime_vars runTimeVarsProvided → runtimeVarsProvided runTimeVarOverrides → runtimeVarOverrides 231 tests pass, ruff clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rohit-Ekbote
left a comment
There was a problem hiding this comment.
Inline notes on items 1, 3, 4, and 5 from the prior review (naming, plan docs, run_slx override, validation order).
| ) | ||
|
|
||
|
|
||
| def _validate_runtime_vars(runtime_vars: list[dict] | None) -> list[str]: |
There was a problem hiding this comment.
Naming inconsistency (item 1). The PR title, body, and the design doc committed in this PR (docs/plans/2026-04-28-script-vars-design.md) all refer to this concept as script_vars / _validate_script_vars / scriptVarsProvided. The code uses runtime_vars / runtimeVarsProvided instead. Pick one term and align everywhere — PR description, design doc, code, tests, the skill section header (Run-Time Variables in skills/build-runwhen-task/SKILL.md), and the lingering test docstring at tests/test_yaml_generation.py that still says "Script vars must NOT appear in configProvided". Otherwise readers (and LLMs that consume this skill alongside the design doc) get conflicting names for the same concept.
There was a problem hiding this comment.
Fixed. Both plan docs have been deleted (commit 50db9b7). The stale "Script vars" test docstring and "Run-Time Variables" skill section header have been updated to "runtime" to match the code.
| **Step 2: Run tests to verify they fail** | ||
|
|
||
| ```bash | ||
| cd /Users/prats/Documents/work/runwhen-platform-mcp |
There was a problem hiding this comment.
Plan docs (item 3). Three issues with this file:
- Hard-coded personal path
/Users/prats/Documents/work/runwhen-platform-mcpwon't work for anyone else who reads or follows the plan. - Terminology is already stale — this plan uses
script_vars/_validate_script_vars/scriptVarsProvidedthroughout, but the implementation that ships in the same PR usesruntime_vars. A reader following the plan will be confused about what the code actually does. - The plan addresses Claude directly (e.g.
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans...) — that's an authoring artifact, not project documentation.
Combined with docs/plans/2026-04-28-script-vars-design.md (110 lines, also using the old script_vars name), this PR adds 818 lines of internal planning material that's already outdated on the day of merge. Worth a team conversation about whether docs/plans/ is the right home for these — at minimum scrub the personal path and align terminology before merge.
There was a problem hiding this comment.
Deleted both docs (commit 50db9b7). Not worth maintaining — the code and tests are the source of truth.
| default=None, | ||
| description=( | ||
| "Per-run override values for runtime variables (name → value). " | ||
| "Passed through to the runner at execution time. Requires staff access." |
There was a problem hiding this comment.
run_slx runtime_var_overrides (item 4). Three concerns with adding this parameter here:
- Not in the design doc or PR description. The design only covers
commit_slxandrun_script_and_wait. Either addrun_slxto the spec or split it into a follow-up PR. - No test coverage.
tests/test_validation.py::TestRunScriptAndWaitRunTimeVarOverridescoversrun_script_and_waitbut there's nothing exercising therun_slxpath — including theif runtime_var_overrides:branch at line 3058 that injectsruntimeVarOverridesinto the run request. - "Requires staff access" is unenforced. The Field description here makes a privilege claim that isn't validated anywhere in the function body. Either enforce it (and add a test) or drop the claim from the docstring so it doesn't mislead callers.
Suggestion: remove this parameter from this PR until the run_slx changes are fleshed out, or add validation + tests + a clear note in the design doc.
There was a problem hiding this comment.
Leaving runtime_var_overrides in run_slx for now. The staff-access enforcement will be handled at the papi layer — we'll revisit user-level access validation on the MCP side separately with Shea.
| } | ||
| ) | ||
|
|
||
| rtv_errors = _validate_runtime_vars(runtime_vars) |
There was a problem hiding this comment.
Fail-fast on validation (item 5). This validation runs here at line 3256, but _resolve_workspace(workspace_name) runs later at line ~3295 — so a structurally invalid runtime_vars payload still triggers a network round-trip to PAPI just to resolve the workspace before bailing out. Move _validate_runtime_vars (and the task_type != "task" check at line 3245, and the env/secret overlap checks below) up so bad input fails immediately without hitting the API.
Side note: tests/test_validation.py::TestCommitSlxRunTimeVarsValidation::test_invalid_runtime_var_returns_error mocks _resolve_workspace, but with the current ordering the code already returns from the validation gate before reaching line 3295 — that mock is effectively unused. Whichever order you settle on, audit those mocks to match.
There was a problem hiding this comment.
Fixed in commit d54c305. The sli_script/cron_schedule conflict check and access/data tag validation have been moved before _resolve_workspace so invalid input fails immediately without a network call. Also removed the now-unnecessary _resolve_workspace mocks from the four tests that exit at the validation gate.
- Delete docs/plans/ (design + archive) — outdated on day of merge - Fix stale "Script vars" docstring in test_yaml_generation.py - Fix stale "Run-Time Variables" header in SKILL.md Addresses reviewer comments on docs/plans/archive and naming inconsistency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move sli_script/cron_schedule conflict check and access/data tag validation above the _resolve_workspace network call so bad input returns immediately without hitting the API. Remove unused _resolve_workspace mocks from four tests that exit during input validation and never reach the workspace resolution step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Re-review of latest changes (d54c305)
All first-round feedback addressed cleanly:
- ✅ SLI guard —
runtime_varsrejected whentask_type != "task" - ✅ Collision checks — env_vars / secret_vars / duplicate-name within runtime_vars
- ✅ CI lint fix — imports hoisted in
tests/test_validation.py - ✅
Field(description=...)beefed up with the env_vars / secret_vars contrast - ✅ Plan docs removed
- ✅ Validation re-ordered for fail-fast in
commit_slx - ✅ 231 tests passing, all CI green
The bigger change in this round — run_slx switching from the 4-step runbook/runs flow to the consolidated /runsessions endpoint — is a clean simplification, and the wire format (runtimeVarsProvided, runtimeVarOverrides) correctly matches project-468/main after this week's rename cycle there.
Minor non-blocking items in run_slx (optional follow-ups)
-
task_titlescontaining||—task_titles.split("||")will mis-split a legitimate title that contains the delimiter. Document the constraint or acceptlist[str]directly. -
full_slx_nameheuristic —slx_name if "--" in slx_name else f"{ws}--{slx_name}"will treat a short SLX name that legitimately contains--as already-prefixed. Worth a one-line note in the docstring. -
response_timepolling —all(rr.get("response_time") for rr in run_requests)matches PAPI's own internal completion signal (papi/tasks/run_session_tasks.py:244, 893), so behaviour is consistent. Pre-runner failures (validation errors, runner unreachable) may or may not setresponse_time— at worst this is parity with PAPI's own polling, not a new bug. -
Result field accessors —
first_rr.get("passed_titles", "")matches the FastAPI Pydantic schema (snake_case) atbackend-services-v2/papi/schemas/run_request.py:187. The dropped camelCase fallback from the old code is fine for the FastAPI-served/runsessionsendpoint.
LGTM — ready to merge from my side.
|
This is only to be merged after the runtimeVarsProvided has made it to prod |
Rohit-Ekbote
left a comment
There was a problem hiding this comment.
Follow-up on the deferred item 4.
| default=None, | ||
| description=( | ||
| "Per-run override values for runtime variables (name → value). " | ||
| "Passed through to the runner at execution time. Requires staff access." |
There was a problem hiding this comment.
Since you're deferring staff-access enforcement to PAPI and not validating it at the MCP layer, this "Requires staff access." line is now misleading — it implies the MCP tool checks the caller's privileges, which it doesn't.
Suggestion: rephrase to make the boundary explicit, e.g.:
Enforced server-side by PAPI — the MCP layer does not validate the caller's privileges. Calls from non-staff users will be rejected by PAPI with a 403.
…or just drop the sentence entirely if PAPI's error surface is good enough on its own.
There was a problem hiding this comment.
Dropped the sentence entirely (commit 4cdadf1). PAPI's 403 is sufficient.
…rrides Access is enforced server-side by PAPI; the MCP layer does no privilege check. The sentence implied otherwise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
merging this as the runtime vars are on beta now |
Rohit-Ekbote
left a comment
There was a problem hiding this comment.
Approving — all four inline items from the prior review rounds have been addressed cleanly:
- Naming consistency fixed in code, tests, and skill (PR title still reads
feat(script-vars)— cosmetic, worth fixing before squash-merge) - Plan docs (818 lines) removed from
docs/plans/ - Misleading "Requires staff access" claim dropped from
run_slxdocstring commit_slxvalidation reordered to fail-fast before_resolve_workspace, with unused mocks cleaned up
Two items left as known follow-ups, not blocking:
run_slxruntime_var_overrideshas no test coverage — author confirmed this will be revisited with Shea alongside staff-access enforcement.- The
run_slxAPI surface rewrite (RunRequest → RunSession, removedoutputfield, polling loop change) was bundled into this PR without being in the design doc. If that's intentional, ✅. If it wasn't meant to ship here, worth a separate discussion — but it's out of scope for this approval.
Summary
script_varsparam tocommit_slx— writesscriptVarsProvidedtorunbook.yamlwith full validation (name, description, default, validation schema)script_var_overridesparam torun_script_and_wait— merged intoenvVarsat test time, overrides win on collision_validate_script_varshelper enforcing required fields on each script var dict before any PAPI callbuild-runwhen-taskskillTest Plan
pytest tests/ --ignore=tests/test_papi_live_smoke.py --ignore=tests/test_mcp_http_remote_smoke.py)TestValidateScriptVars— 13 tests covering all valid/invalid field combinationsTestBuildRunbookYaml— 5 new tests verifyingscriptVarsProvidedwritten correctly and not leaked intoconfigProvidedTestCommitSlxScriptVarsValidation— validation gate returns structured error for badscript_varsTestRunScriptAndWaitScriptVarOverrides— 3 tests verifying merge behavior and precedence🤖 Generated with Claude Code