[codex] add live provider E2E workflow#175
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 7:10 PM ET / 23:10 UTC. Summary Reproducibility: not applicable. this is a new maintainer-run CI workflow and script, not a current-main bug report. I reviewed the PR diff, current main, and the paired proposal instead of reproducing a failing behavior. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land the manual workflow only after provider credentials are scoped to the selected provider, third-party client installs cannot see unrelated secrets, the provider list comes from the registry/CLI, and redacted real run proof is attached. Do we have a high-confidence way to reproduce the issue? Not applicable: this is a new maintainer-run CI workflow and script, not a current-main bug report. I reviewed the PR diff, current main, and the paired proposal instead of reproducing a failing behavior. Is this the best way to solve the issue? No: the direction is useful, but this implementation is not the safest path because it exposes all provider secrets to each job and duplicates the provider inventory. A narrower workflow should scope secrets per provider, derive the matrix from current provider registration, and include redacted real run proof. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 0511f91aa357. Label changesLabel changes:
Label justifications:
Evidence reviewedSecurity concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
Would be nice to have ci/cd to smoke test the providers or local sandboxes across machines and archs |
Summary
Live Provider E2EGitHub Actions workflow that expands a selected provider list into one smoke job per built-in providerscripts/live-provider-e2e.shto run fast provider-specific smoke checks, skip missing secret/tool setup when requested, and clean up leases where supporteddocs/features/live-provider-e2e.mdNotes
The workflow is
workflow_dispatchonly, defaults toallow_missing=true, and does not expose provider secrets to pull request jobs. Maintainers can switchallow_missing=falseonce the selected provider secrets are expected to be present.Some providers still require a runner-side CLI or provider-side resource that cannot be represented as a GitHub secret, such as
devbox,sprite, an existing Railway service, or a self-hosted Parallels host. Those requirements are called out in the doc.Verification
bash -n scripts/live-provider-e2e.shruby -e 'require "yaml"; YAML.load_file(".github/workflows/live-provider-e2e.yml"); puts "workflow yaml ok"'go build -trimpath -o /tmp/crabbox-provider-e2e ./cmd/crabboxnode scripts/check-docs-links.mjsCRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh awsCRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh e2b