You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add action, network, dependency-install, retry, and cleanup guardrails for QA runs. The harness should cover real local setup behavior while keeping risky work explicit, isolated, logged, and bounded.
This slice should make supervised QA practical for agent-run repo setup checks without pretending the first version is a full network sandbox.
Acceptance criteria
QA runs use fresh isolated app state and fresh isolated repo clones by default.
Passed runs are short-lived, failed runs can be kept up to 7 days, and a hard size cap prunes oldest runs first while preserving useful failure evidence when possible.
Scenario-level action allowlists can allow bounded setup steps such as env save or dependency install while blocking long-running launch commands by default.
GitHub clone is allowed by default only for the target repo named by the scenario.
Dependency install network is allowed only when the scenario allowlist enables the matching setup action.
Before dependency install, the agent has enough package-manager files, install scripts, repo context, and scenario intent to judge whether install is acceptable for supervised QA.
Run report records dependency network use and states when install scripts may have run.
Provider API calls are blocked or faked by default unless scenario policy explicitly allows real provider validation.
Network evidence includes best-effort network-bearing action summary, command/package-manager logs, and known host groups used, without claiming full outbound isolation.
Runner may retry once only before any file or repo state changes; after state changes, crash/bridge death stops affected session, logs evidence, and marks remaining work blocked.
[ ]
eport mode can reopen prior evidence by run id, and clean mode can prune QA runs immediately.
What to build
Add action, network, dependency-install, retry, and cleanup guardrails for QA runs. The harness should cover real local setup behavior while keeping risky work explicit, isolated, logged, and bounded.
This slice should make supervised QA practical for agent-run repo setup checks without pretending the first version is a full network sandbox.
Acceptance criteria
eport mode can reopen prior evidence by run id, and clean mode can prune QA runs immediately.
Blocked by