Skip to content

[codex] add live provider E2E workflow#175

Draft
zozo123 wants to merge 1 commit into
openclaw:mainfrom
zozo123:codex/live-provider-e2e-gh-actions
Draft

[codex] add live provider E2E workflow#175
zozo123 wants to merge 1 commit into
openclaw:mainfrom
zozo123:codex/live-provider-e2e-gh-actions

Conversation

@zozo123
Copy link
Copy Markdown
Contributor

@zozo123 zozo123 commented May 27, 2026

Summary

  • add a manual Live Provider E2E GitHub Actions workflow that expands a selected provider list into one smoke job per built-in provider
  • add scripts/live-provider-e2e.sh to run fast provider-specific smoke checks, skip missing secret/tool setup when requested, and clean up leases where supported
  • document the maintainer secret checklist and workflow controls in docs/features/live-provider-e2e.md

Notes

The workflow is workflow_dispatch only, defaults to allow_missing=true, and does not expose provider secrets to pull request jobs. Maintainers can switch allow_missing=false once the selected provider secrets are expected to be present.

Some providers still require a runner-side CLI or provider-side resource that cannot be represented as a GitHub secret, such as devbox, sprite, an existing Railway service, or a self-hosted Parallels host. Those requirements are called out in the doc.

Verification

  • bash -n scripts/live-provider-e2e.sh
  • ruby -e 'require "yaml"; YAML.load_file(".github/workflows/live-provider-e2e.yml"); puts "workflow yaml ok"'
  • go build -trimpath -o /tmp/crabbox-provider-e2e ./cmd/crabbox
  • node scripts/check-docs-links.mjs
  • CRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh aws
  • CRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh e2b

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 27, 2026

Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 7:10 PM ET / 23:10 UTC.

Summary
The PR adds a manual live-provider E2E GitHub Actions matrix, a provider smoke shell script, and docs for configuring maintainer provider secrets.

Reproducibility: not applicable. this is a new maintainer-run CI workflow and script, not a current-main bug report. I reviewed the PR diff, current main, and the paired proposal instead of reproducing a failing behavior.

Review metrics: 2 noteworthy metrics.

  • Diff size: 4 files changed, 735 insertions, 0 deletions. This is a new workflow, script, and docs surface rather than a small adjustment to existing CI.
  • Workflow secret surface: 89 secret mappings in one matrix job. Every provider leg receives the full secret surface before the script narrows behavior by provider.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Scope provider secrets to the selected provider and keep unrelated credentials away from package-install steps.
  • Derive the provider matrix from registered providers instead of maintaining a hard-coded duplicate list.
  • Add redacted terminal output, workflow logs, or a linked artifact proving representative provider and missing-secret paths after the fixes.

Proof guidance:
Needs real behavior proof before merge: The PR body lists verification commands but no redacted terminal output, workflow log, or artifact showing the new workflow/script behavior; the contributor should add proof with private details redacted and update the PR body for a fresh ClawSweeper review.

Risk before merge

  • Job-wide environment injection exposes unrelated provider, broker, cloud, and SSH secrets to every matrix leg, including the Modal and Tensorlake package installation steps.
  • The workflow hard-codes the provider inventory even though current main exposes registered providers through the CLI, so all-provider coverage can drift silently as providers are added or renamed.
  • The PR body lists verification commands but no redacted terminal output, workflow run, or artifact proving the new workflow/script behavior after the change.
  • The paired proposal at Proposal: add live provider E2E GitHub Actions matrix #176 is still open, so maintainers should resolve the proposal and draft implementation together.

Maintainer options:

  1. Narrow the secret and install boundary (recommended)
    Move provider credentials out of job-wide env, inject only the selected provider's credentials after third-party installs, remove unused admin-token exposure, and pin or preinstall provider clients.
  2. Generate the all-provider matrix
    Replace the hard-coded provider list with output from the registered provider inventory and keep alias normalization only for user input.
  3. Keep draft until live proof is posted
    Do not merge until a redacted workflow log, terminal output, or linked artifact shows representative provider legs and missing-secret behavior after the fixes.

Next step before merge
Human review is needed because the draft PR requires contributor proof and maintainer-visible secret-scoping changes before merge.

Security
Needs attention: Needs attention: the workflow currently combines broad secret exposure with unpinned third-party package installation.

Review findings

  • [P1] Scope provider secrets before installing clients — .github/workflows/live-provider-e2e.yml:121-123
  • [P1] Build the matrix from registered providers — .github/workflows/live-provider-e2e.yml:52-76
Review details

Best possible solution:

Land the manual workflow only after provider credentials are scoped to the selected provider, third-party client installs cannot see unrelated secrets, the provider list comes from the registry/CLI, and redacted real run proof is attached.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a new maintainer-run CI workflow and script, not a current-main bug report. I reviewed the PR diff, current main, and the paired proposal instead of reproducing a failing behavior.

Is this the best way to solve the issue?

No: the direction is useful, but this implementation is not the safest path because it exposes all provider secrets to each job and duplicates the provider inventory. A narrower workflow should scope secrets per provider, derive the matrix from current provider registration, and include redacted real run proof.

Full review comments:

  • [P1] Scope provider secrets before installing clients — .github/workflows/live-provider-e2e.yml:121-123
    The job exports the full provider secret surface to every matrix leg before installing the Modal or Tensorlake Python clients. Those installs run unpinned third-party code, so a Modal/Tensorlake smoke can read unrelated AWS, GCP, broker, SSH, and other provider credentials. Keep provider credentials out of the job env and inject only the selected provider's secrets after any client installation, or use pinned/preinstalled clients.
    Confidence: 0.89
  • [P1] Build the matrix from registered providers — .github/workflows/live-provider-e2e.yml:52-76
    The workflow hard-codes the same provider inventory that current main already exposes through the registered provider matrix and crabbox providers --json. The next built-in provider will make providers=all silently incomplete unless two lists are kept in sync. Generate the list from the CLI/registry and keep this workflow responsible only for filtering aliases/user input.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0511f91aa357.

Label changes

Label changes:

  • add P2: This is a normal-priority draft CI/admin improvement, but it has merge-blocking security and automation concerns before it can land.
  • add merge-risk: 🚨 security-boundary: Merging as-is would expose unrelated provider and broker secrets to every matrix job and to unpinned third-party install steps.
  • add merge-risk: 🚨 automation: The workflow duplicates the provider registry with a hard-coded list, so future all-provider smoke runs can silently miss built-in providers.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists verification commands but no redacted terminal output, workflow log, or artifact showing the new workflow/script behavior; the contributor should add proof with private details redacted and update the PR body for a fresh ClawSweeper review.

Label justifications:

  • P2: This is a normal-priority draft CI/admin improvement, but it has merge-blocking security and automation concerns before it can land.
  • merge-risk: 🚨 security-boundary: Merging as-is would expose unrelated provider and broker secrets to every matrix job and to unpinned third-party install steps.
  • merge-risk: 🚨 automation: The workflow duplicates the provider registry with a hard-coded list, so future all-provider smoke runs can silently miss built-in providers.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists verification commands but no redacted terminal output, workflow log, or artifact showing the new workflow/script behavior; the contributor should add proof with private details redacted and update the PR body for a fresh ClawSweeper review.
Evidence reviewed

Security concerns:

  • [high] Broad provider secrets exposed to unpinned installs — .github/workflows/live-provider-e2e.yml:121
    Every matrix job receives all provider secrets, then Modal/Tensorlake jobs install unpinned Python packages before running the smoke script; a compromised install path could read unrelated cloud, broker, or SSH credentials.
    Confidence: 0.9

What I checked:

  • PR workflow exposes all provider secrets job-wide: The added workflow maps the full provider, broker, SSH, and cloud secret surface into the provider-e2e job environment for every matrix provider. (.github/workflows/live-provider-e2e.yml:121, 4132f3aeabf3)
  • PR installs unpinned provider clients under that job environment: The Modal and Tensorlake legs install unpinned Python packages in the same job that already has all provider secrets in env. (.github/workflows/live-provider-e2e.yml:241, 4132f3aeabf3)
  • Current main has a provider inventory API: Current main already exposes registered providers through crabbox providers --json, so the new hard-coded workflow list duplicates an existing source of truth. (internal/cli/providers.go:20, 0511f91aa357)
  • Current main does not already have this workflow: A current-main search found no live-provider-e2e workflow, script, or CRABBOX_PROVIDER_E2E surface, so this PR is not obsolete or implemented already. (0511f91aa357)
  • Related proposal remains open: The paired proposal at Proposal: add live provider E2E GitHub Actions matrix #176 is open and points to this draft implementation, so the best path is to review the PR rather than close either half prematurely.
  • Real behavior proof is missing: The PR body lists commands but no redacted workflow log, terminal output, or artifact showing the new workflow/script behavior after the change. (4132f3aeabf3)

Likely related people:

  • steipete: Recent history shows substantial workflow, live-smoke, provider registry, and provider lifecycle work, including live smoke hardening and failed-run UX changes. (role: recent area contributor; confidence: high; commits: 08afcc44724f, bde6c4fcd9a2, 7bd2c5099d9c; files: .github/workflows/ci.yml, scripts/live-smoke.sh, internal/cli/providers.go)
  • zozo123: Beyond this PR, prior merged history shows provider additions and registry wiring for Modal and W&B, which are part of the live-provider matrix surface. (role: provider contributor; confidence: medium; commits: 0067c7194a8d, 8dfa7c348551; files: internal/providers/modal, internal/providers/wandb, internal/providers/all/all.go)
  • stainlu: Commit history shows direct provider live smoke coverage work in the existing live smoke script, which is adjacent to this new workflow/script proposal. (role: adjacent live-smoke contributor; confidence: medium; commits: 93b81ed16da4; files: scripts/live-smoke.sh)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 27, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@zozo123
Copy link
Copy Markdown
Contributor Author

zozo123 commented May 30, 2026

Would be nice to have ci/cd to smoke test the providers or local sandboxes across machines and archs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant