Skip to content

RW-834 feat(script-vars): runtime-overridable script variables for commit_slx and run_script_and_wait#9

Merged
theyashl merged 14 commits into
mainfrom
feature/script-vars
May 12, 2026
Merged

RW-834 feat(script-vars): runtime-overridable script variables for commit_slx and run_script_and_wait#9
theyashl merged 14 commits into
mainfrom
feature/script-vars

Conversation

@theyashl
Copy link
Copy Markdown
Contributor

Summary

  • Adds script_vars param to commit_slx — writes scriptVarsProvided to runbook.yaml with full validation (name, description, default, validation schema)
  • Adds script_var_overrides param to run_script_and_wait — merged into envVars at test time, overrides win on collision
  • Adds _validate_script_vars helper enforcing required fields on each script var dict before any PAPI call
  • Documents the script vars classification rules and usage patterns in the build-runwhen-task skill

Test Plan

  • 226 unit tests pass (pytest tests/ --ignore=tests/test_papi_live_smoke.py --ignore=tests/test_mcp_http_remote_smoke.py)
  • TestValidateScriptVars — 13 tests covering all valid/invalid field combinations
  • TestBuildRunbookYaml — 5 new tests verifying scriptVarsProvided written correctly and not leaked into configProvided
  • TestCommitSlxScriptVarsValidation — validation gate returns structured error for bad script_vars
  • TestRunScriptAndWaitScriptVarOverrides — 3 tests verifying merge behavior and precedence

🤖 Generated with Claude Code

theyashl and others added 7 commits April 28, 2026 00:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ld checks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n gate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erged into envVars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@stewartshea stewartshea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few review comments. The first item is the only blocker for CI; the rest are correctness/cleanup suggestions.

0. CI is failing — tests/test_validation.py lint fix

Both lint-and-test (3.10) and lint-and-test (3.12) jobs fail with:

I001 [*] Import block is un-sorted or un-formatted
   --> tests/test_validation.py:367:9
    |
365 |       @mock.patch("runwhen_platform_mcp.server._resolve_workspace", new_callable=mock.AsyncMock)
366 |       def test_invalid_script_var_returns_error(self, mock_resolve) -> None:
367 | /         import json
368 | |         from runwhen_platform_mcp.server import commit_slx  # noqa: PLC0415

Fix: move import json and from runwhen_platform_mcp.server import commit_slx to the file's top-level import block (alongside the existing from runwhen_platform_mcp.server import (...) in tests/test_validation.py), then run:

ruff check --fix . && ruff format .

The lazy-import pattern with # noqa: PLC0415 reads as a workaround for an import cycle that doesn't actually exist here — commit_slx is already imported the normal way elsewhere in the suite. Hoisting it removes the noqa and silences I001 in one move.

1. SLI handling is silently broken

commit_slx accepts script_vars regardless of task_type, runs _validate_script_vars, and then quietly drops the value when task_type="sli" (only _build_runbook_yaml receives them; _build_sli_yaml and _build_cron_sli_yaml do not). The SKILL.md says "NEVER pass script_vars when task_type='sli'" but the code accepts and ignores. Agents will pass them, get a successful commit response, and never realise their schema was dropped.

Reject early in commit_slx:

if script_vars and task_type != "task":
    return _json_response({
        "error": "script_vars are only valid for task_type='task'. "
                 "SLIs are automated probes with fixed thresholds and have no "
                 "per-run override concept.",
    })

Cron-scheduler SLI mode (where task_type="task" but a cron_schedule is also given) is a related case — _build_cron_sli_yaml won't write script vars while _build_runbook_yaml will. That asymmetry is defensible (the runbook has them, the cron-driven SLI doesn't override anything per run) but worth a one-line code comment so future maintainers don't read it as a bug.

2. Missing collision checks

Three name-collision cases can produce silently broken YAML today:

a. env_varsscript_vars. If LOG_QUERY is in both, the runbook gets LOG_QUERY in both configProvided AND scriptVarsProvided. This will happen frequently because _extract_env_vars auto-discovers os.environ.get("LOG_QUERY") from the script source — agents will then declare the same name in script_vars and end up with the dup. Reject:

if script_vars and env_vars:
    overlap = {sv["name"] for sv in script_vars} & set(env_vars)
    if overlap:
        return _json_response({
            "error": f"Names appear in both env_vars and script_vars: {sorted(overlap)}. "
                     "A name must be one or the other.",
        })

b. script_varssecret_vars. Lower frequency but worth checking by symmetry — secrets and script vars share the env-var namespace on the runner.

c. Duplicates within script_vars itself. _validate_script_vars doesn't reject [{"name": "FOO", ...}, {"name": "FOO", ...}]. Add a seen-set check inside the existing loop:

seen: set[str] = set()
for i, var in enumerate(script_vars):
    name = var.get("name")
    if name and name in seen:
        errors.append(f"script_vars[{i}]: duplicate name {name!r}")
    if name:
        seen.add(name)

3. docs/plans/ accumulation

This PR adds 818 lines of plan/design docs (2026-04-28-script-vars-design.md 110, 2026-04-28-script-vars-plan.md 708). Once merged these become stale "here's what we did" artefacts that searches will surface as if they're current spec. Two options:

  • Trim before merge: keep the short design doc (it captures intent), drop the 708-line implementation plan, or condense both into a single one-page post-implementation summary.
  • Follow-up cleanup: open a tracking issue to move dated docs/plans/*-plan.md files under docs/plans/archive/ after merge, and apply the same convention here.

Either's fine, but a 700-line implementation plan landing in the main tree is the kind of thing that compounds. Worth setting the convention now while the repo is small. (The 468-platform UI PR has the same anti-pattern with 4 plan docs — establishing it here gives that team a pattern to follow.)

4. Naming nit — flagging it for further internal discussion

script_vars vs env_vars reads as two flavours of the same thing to an LLM agent: both end up as env vars on the runner, both are lists of name/value-ish pairs, only the s differentiates them at the tool boundary. The classification ladder in SKILL.md helps, but the param name itself doesn't carry the signal. Worth revisiting (task_parameters, task_inputs, runtime_inputs) at some point — but only in coordination with the UI side; diverging unilaterally from the platform's scriptVarsProvided CRD field and the UI's scriptVars types creates more confusion than it removes.

For this PR: at minimum, beef up the Field(description=...) on script_vars to explicitly contrast against env_vars and secret_vars, so the decision rule lives at the tool boundary, not only in the skill doc:

Per-run task parameters that the END USER fills in when invoking the committed task (log queries, time windows, filters). Distinct from env_vars (set once by the task author — cluster, namespace, context) and secret_vars (credentials). Task-only — never valid for SLIs.

…review comments

- Rename all occurrences: script_vars → run_time_vars, script_var_overrides →
  run_time_var_overrides, scriptVarsProvided → runTimeVarsProvided (capital T)
- Fix CI blocker: hoist inline imports (json, commit_slx, run_script_and_wait,
  asyncio) to top-level import block in test_validation.py, removing noqa workaround
- Reject run_time_vars early when task_type != 'task' (SLI guard)
- Add collision checks: env_vars ∩ run_time_vars and secret_vars ∩ run_time_vars
- Add duplicate-name check inside _validate_run_time_vars (seen-set)
- Beef up Field(description=...) for run_time_vars to contrast env_vars/secret_vars
- Archive 708-line implementation plan to docs/plans/archive/
- Add 5 new tests: SLI rejection, env_vars collision, secret_vars collision,
  duplicate name flagged, duplicate name ok (231 tests total, all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@theyashl
Copy link
Copy Markdown
Contributor Author

All five items addressed in the latest commit (150fed9).

#0 — CI fix: Hoisted import json, import asyncio, commit_slx, and run_script_and_wait to the top-level import block. Removed the # noqa: PLC0415 workaround entirely. ruff check is clean.

#1 — SLI guard: commit_slx now rejects run_time_vars immediately if task_type != "task" before any other work:

if run_time_vars and task_type != "task":
    return _json_response({"error": "run_time_vars are only valid for task_type='task'. ..."})

#2 — Collision checks:

  • a env_vars ∩ run_time_vars overlap → error listing the conflicting names
  • b secret_vars ∩ run_time_vars overlap → same pattern
  • c Duplicate names within run_time_vars itself → detected via seen: set[str] inside _validate_run_time_vars

All three backed by new tests (231 total, all pass).

#3 — Docs: 708-line plan moved to docs/plans/archive/. Design doc stays.

#4 — Naming: Renamed to run_time_vars / run_time_var_overrides (snake_case) and runTimeVarsProvided (camelCase) throughout — coordinated with the agentfarm side (PR #191). Field(description=...) now explicitly contrasts env_vars (infra targets, set once by author), secret_vars (credentials), and run_time_vars (per-run user inputs).

@theyashl theyashl changed the title feat(script-vars): runtime-overridable script variables for commit_slx and run_script_and_wait RW-834 feat(script-vars): runtime-overridable script variables for commit_slx and run_script_and_wait Apr 30, 2026
theyashl and others added 3 commits April 30, 2026 21:13
Replace the 4-step runbook/runs flow with the runsessions endpoint:
- POST /api/v3/workspaces/{ws}/runsessions with source=direct and run_requests
  payload; run requests are triggered automatically by the endpoint
- Poll GET /api/v3/workspaces/{ws}/runsessions/{id} until all run requests have
  response_time set
- Results (issues, passed/failed/skipped titles) are embedded in run_requests[]
  in the session response — no separate /output call needed

Add optional run_time_var_overrides param (dict[str, str]) passed through to
the run request payload; staff access gate is enforced by papi.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The papi API expects camelCase in JSON: runRequests, slxName, taskTitles,
runTimeVarOverrides. Also convert the taskTitles string param to a list
as the API expects (split on '||', or ["*"] for the wildcard default).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Align with agentfarm convention (single word). All identifiers, YAML
keys, and prose updated:
  run_time_vars → runtime_vars
  run_time_var_overrides → runtime_var_overrides
  _validate_run_time_vars → _validate_runtime_vars
  runTimeVarsProvided → runtimeVarsProvided
  runTimeVarOverrides → runtimeVarOverrides

231 tests pass, ruff clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@Rohit-Ekbote Rohit-Ekbote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline notes on items 1, 3, 4, and 5 from the prior review (naming, plan docs, run_slx override, validation order).

)


def _validate_runtime_vars(runtime_vars: list[dict] | None) -> list[str]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming inconsistency (item 1). The PR title, body, and the design doc committed in this PR (docs/plans/2026-04-28-script-vars-design.md) all refer to this concept as script_vars / _validate_script_vars / scriptVarsProvided. The code uses runtime_vars / runtimeVarsProvided instead. Pick one term and align everywhere — PR description, design doc, code, tests, the skill section header (Run-Time Variables in skills/build-runwhen-task/SKILL.md), and the lingering test docstring at tests/test_yaml_generation.py that still says "Script vars must NOT appear in configProvided". Otherwise readers (and LLMs that consume this skill alongside the design doc) get conflicting names for the same concept.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Both plan docs have been deleted (commit 50db9b7). The stale "Script vars" test docstring and "Run-Time Variables" skill section header have been updated to "runtime" to match the code.

**Step 2: Run tests to verify they fail**

```bash
cd /Users/prats/Documents/work/runwhen-platform-mcp
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan docs (item 3). Three issues with this file:

  1. Hard-coded personal path /Users/prats/Documents/work/runwhen-platform-mcp won't work for anyone else who reads or follows the plan.
  2. Terminology is already stale — this plan uses script_vars / _validate_script_vars / scriptVarsProvided throughout, but the implementation that ships in the same PR uses runtime_vars. A reader following the plan will be confused about what the code actually does.
  3. The plan addresses Claude directly (e.g. > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans...) — that's an authoring artifact, not project documentation.

Combined with docs/plans/2026-04-28-script-vars-design.md (110 lines, also using the old script_vars name), this PR adds 818 lines of internal planning material that's already outdated on the day of merge. Worth a team conversation about whether docs/plans/ is the right home for these — at minimum scrub the personal path and align terminology before merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted both docs (commit 50db9b7). Not worth maintaining — the code and tests are the source of truth.

Comment thread runwhen_platform_mcp/server.py Outdated
default=None,
description=(
"Per-run override values for runtime variables (name → value). "
"Passed through to the runner at execution time. Requires staff access."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_slx runtime_var_overrides (item 4). Three concerns with adding this parameter here:

  1. Not in the design doc or PR description. The design only covers commit_slx and run_script_and_wait. Either add run_slx to the spec or split it into a follow-up PR.
  2. No test coverage. tests/test_validation.py::TestRunScriptAndWaitRunTimeVarOverrides covers run_script_and_wait but there's nothing exercising the run_slx path — including the if runtime_var_overrides: branch at line 3058 that injects runtimeVarOverrides into the run request.
  3. "Requires staff access" is unenforced. The Field description here makes a privilege claim that isn't validated anywhere in the function body. Either enforce it (and add a test) or drop the claim from the docstring so it doesn't mislead callers.

Suggestion: remove this parameter from this PR until the run_slx changes are fleshed out, or add validation + tests + a clear note in the design doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving runtime_var_overrides in run_slx for now. The staff-access enforcement will be handled at the papi layer — we'll revisit user-level access validation on the MCP side separately with Shea.

}
)

rtv_errors = _validate_runtime_vars(runtime_vars)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail-fast on validation (item 5). This validation runs here at line 3256, but _resolve_workspace(workspace_name) runs later at line ~3295 — so a structurally invalid runtime_vars payload still triggers a network round-trip to PAPI just to resolve the workspace before bailing out. Move _validate_runtime_vars (and the task_type != "task" check at line 3245, and the env/secret overlap checks below) up so bad input fails immediately without hitting the API.

Side note: tests/test_validation.py::TestCommitSlxRunTimeVarsValidation::test_invalid_runtime_var_returns_error mocks _resolve_workspace, but with the current ordering the code already returns from the validation gate before reaching line 3295 — that mock is effectively unused. Whichever order you settle on, audit those mocks to match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit d54c305. The sli_script/cron_schedule conflict check and access/data tag validation have been moved before _resolve_workspace so invalid input fails immediately without a network call. Also removed the now-unnecessary _resolve_workspace mocks from the four tests that exit at the validation gate.

theyashl and others added 2 commits May 5, 2026 16:57
- Delete docs/plans/ (design + archive) — outdated on day of merge
- Fix stale "Script vars" docstring in test_yaml_generation.py
- Fix stale "Run-Time Variables" header in SKILL.md

Addresses reviewer comments on docs/plans/archive and naming inconsistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move sli_script/cron_schedule conflict check and access/data tag
validation above the _resolve_workspace network call so bad input
returns immediately without hitting the API.

Remove unused _resolve_workspace mocks from four tests that exit
during input validation and never reach the workspace resolution step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@stewartshea stewartshea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of latest changes (d54c305)

All first-round feedback addressed cleanly:

  • ✅ SLI guard — runtime_vars rejected when task_type != "task"
  • ✅ Collision checks — env_vars / secret_vars / duplicate-name within runtime_vars
  • ✅ CI lint fix — imports hoisted in tests/test_validation.py
  • Field(description=...) beefed up with the env_vars / secret_vars contrast
  • ✅ Plan docs removed
  • ✅ Validation re-ordered for fail-fast in commit_slx
  • ✅ 231 tests passing, all CI green

The bigger change in this round — run_slx switching from the 4-step runbook/runs flow to the consolidated /runsessions endpoint — is a clean simplification, and the wire format (runtimeVarsProvided, runtimeVarOverrides) correctly matches project-468/main after this week's rename cycle there.

Minor non-blocking items in run_slx (optional follow-ups)

  1. task_titles containing ||task_titles.split("||") will mis-split a legitimate title that contains the delimiter. Document the constraint or accept list[str] directly.

  2. full_slx_name heuristicslx_name if "--" in slx_name else f"{ws}--{slx_name}" will treat a short SLX name that legitimately contains -- as already-prefixed. Worth a one-line note in the docstring.

  3. response_time pollingall(rr.get("response_time") for rr in run_requests) matches PAPI's own internal completion signal (papi/tasks/run_session_tasks.py:244, 893), so behaviour is consistent. Pre-runner failures (validation errors, runner unreachable) may or may not set response_time — at worst this is parity with PAPI's own polling, not a new bug.

  4. Result field accessorsfirst_rr.get("passed_titles", "") matches the FastAPI Pydantic schema (snake_case) at backend-services-v2/papi/schemas/run_request.py:187. The dropped camelCase fallback from the old code is fine for the FastAPI-served /runsessions endpoint.

LGTM — ready to merge from my side.

@stewartshea
Copy link
Copy Markdown
Contributor

This is only to be merged after the runtimeVarsProvided has made it to prod

Copy link
Copy Markdown

@Rohit-Ekbote Rohit-Ekbote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up on the deferred item 4.

Comment thread runwhen_platform_mcp/server.py Outdated
default=None,
description=(
"Per-run override values for runtime variables (name → value). "
"Passed through to the runner at execution time. Requires staff access."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're deferring staff-access enforcement to PAPI and not validating it at the MCP layer, this "Requires staff access." line is now misleading — it implies the MCP tool checks the caller's privileges, which it doesn't.

Suggestion: rephrase to make the boundary explicit, e.g.:

Enforced server-side by PAPI — the MCP layer does not validate the caller's privileges. Calls from non-staff users will be rejected by PAPI with a 403.

…or just drop the sentence entirely if PAPI's error surface is good enough on its own.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the sentence entirely (commit 4cdadf1). PAPI's 403 is sufficient.

…rrides

Access is enforced server-side by PAPI; the MCP layer does no privilege
check. The sentence implied otherwise.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@theyashl
Copy link
Copy Markdown
Contributor Author

merging this as the runtime vars are on beta now

cc: @Rohit-Ekbote @stewartshea

Copy link
Copy Markdown

@Rohit-Ekbote Rohit-Ekbote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — all four inline items from the prior review rounds have been addressed cleanly:

  • Naming consistency fixed in code, tests, and skill (PR title still reads feat(script-vars) — cosmetic, worth fixing before squash-merge)
  • Plan docs (818 lines) removed from docs/plans/
  • Misleading "Requires staff access" claim dropped from run_slx docstring
  • commit_slx validation reordered to fail-fast before _resolve_workspace, with unused mocks cleaned up

Two items left as known follow-ups, not blocking:

  • run_slx runtime_var_overrides has no test coverage — author confirmed this will be revisited with Shea alongside staff-access enforcement.
  • The run_slx API surface rewrite (RunRequest → RunSession, removed output field, polling loop change) was bundled into this PR without being in the design doc. If that's intentional, ✅. If it wasn't meant to ship here, worth a separate discussion — but it's out of scope for this approval.

@theyashl theyashl merged commit ad84d35 into main May 12, 2026
5 checks passed
@theyashl theyashl deleted the feature/script-vars branch May 12, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants