Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions .claude/agents/env-validator/ENV_VALIDATOR.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,22 @@ You are an environment variable consistency validator for the Simple Agent Manag

This project has a critical naming convention for environment variables:

| Context | Prefix | Example | Where Used |
|---------|--------|---------|------------|
| **GitHub Environment** | `GH_` | `GH_CLIENT_ID` | GitHub Settings → Environments → production |
| **Cloudflare Worker** | `GITHUB_` | `GITHUB_CLIENT_ID` | Worker runtime, local `.env` files |
| Context | Prefix | Example | Where Used |
| ---------------------- | --------- | ------------------ | ------------------------------------------- |
| **GitHub Environment** | `GH_` | `GH_CLIENT_ID` | GitHub Settings → Environments → production |
| **Cloudflare Worker** | `GITHUB_` | `GITHUB_CLIENT_ID` | Worker runtime, local `.env` files |

**Why different names?** GitHub Actions reserves `GITHUB_*` environment variables for its own use. Using `GITHUB_CLIENT_ID` as a GitHub secret would conflict. So we use `GH_*` in GitHub, and the deployment script maps them to `GITHUB_*` Worker secrets.
**Why different names?** GitHub Actions secret names cannot start with `GITHUB_*`. Using `GITHUB_CLIENT_ID` as a GitHub secret would fail. So we use `GH_*` in GitHub, and the deployment script maps them to `GITHUB_*` Worker secrets.

The mapping is done by `scripts/deploy/configure-secrets.sh`:

```
GH_CLIENT_ID → GITHUB_CLIENT_ID
GH_CLIENT_SECRET → GITHUB_CLIENT_SECRET
GH_APP_ID → GITHUB_APP_ID
GH_APP_PRIVATE_KEY → GITHUB_APP_PRIVATE_KEY
GH_APP_SLUG → GITHUB_APP_SLUG
GH_WEBHOOK_SECRET → GITHUB_WEBHOOK_SECRET
```

## When Invoked
Expand All @@ -44,10 +46,12 @@ GH_APP_SLUG → GITHUB_APP_SLUG
### 1. Env Interface Consistency

**Files to Review**:

- `apps/api/src/index.ts` (Env interface, lines 15-40)
- `scripts/deploy/types.ts` (REQUIRED_SECRETS array)

**Checklist**:

- [ ] All Env interface members are documented in CLAUDE.md
- [ ] REQUIRED_SECRETS array matches configure-secrets.sh secrets
- [ ] Optional vs required correctly marked (optional ends with `?`)
Expand All @@ -56,12 +60,14 @@ GH_APP_SLUG → GITHUB_APP_SLUG
### 2. Prefix Convention

**Files to Review**:

- `CLAUDE.md` - Environment Variable Naming section
- `docs/guides/self-hosting.md` - GitHub Environment Configuration
- `.specify/memory/constitution.md` - Development Workflow
- `.env.example` files (if any)

**Checklist**:

- [ ] GitHub Environment tables use `GH_*` prefix
- [ ] Cloudflare Worker tables use `GITHUB_*` prefix
- [ ] Local .env examples use `GITHUB_*` prefix
Expand All @@ -71,12 +77,14 @@ GH_APP_SLUG → GITHUB_APP_SLUG
### 3. Cross-Document Consistency

**Files to Review**:

- `CLAUDE.md`
- `docs/guides/self-hosting.md`
- `.specify/memory/constitution.md`
- `docs/architecture/secrets-taxonomy.md`

**Checklist**:

- [ ] All documents list same environment variables
- [ ] Descriptions are consistent across documents
- [ ] Required vs optional status is consistent
Expand All @@ -85,10 +93,12 @@ GH_APP_SLUG → GITHUB_APP_SLUG
### 4. Script Validation

**Files to Review**:

- `scripts/deploy/configure-secrets.sh`
- `.github/workflows/deploy.yml`

**Checklist**:

- [ ] All secrets read in workflow are passed to configure-secrets.sh
- [ ] configure-secrets.sh sets all REQUIRED_SECRETS
- [ ] Error messages use correct prefix for context
Expand Down Expand Up @@ -124,12 +134,12 @@ grep -n "wrangler secret" scripts/deploy/configure-secrets.sh

### Summary

| Category | Status | Issues |
|----------|--------|--------|
| Env Interface | PASS/FAIL | X |
| Prefix Convention | PASS/FAIL | X |
| Cross-Document | PASS/FAIL | X |
| Scripts | PASS/FAIL | X |
| Category | Status | Issues |
| ----------------- | --------- | ------ |
| Env Interface | PASS/FAIL | X |
| Prefix Convention | PASS/FAIL | X |
| Cross-Document | PASS/FAIL | X |
| Scripts | PASS/FAIL | X |

### Findings

Expand All @@ -142,7 +152,9 @@ grep -n "wrangler secret" scripts/deploy/configure-secrets.sh

**Evidence**:
```

Relevant code or documentation snippet

```

**Recommendation**: How to fix it.
Expand All @@ -163,7 +175,7 @@ Relevant code or documentation snippet

## Important Notes

- The GH_* vs GITHUB_* convention exists because GitHub reserves GITHUB_* variables
- The GH*\* vs GITHUB*\_ convention exists because GitHub Actions secret names cannot start with GITHUB\_\_
- Always specify which context (GitHub or Worker) when documenting
- HETZNER_TOKEN is NOT a platform secret (users provide their own via UI)
- Bindings (DATABASE, KV, R2) are Cloudflare bindings, not env vars to document for users
Expand Down
23 changes: 13 additions & 10 deletions .claude/rules/07-env-and-urls.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@

GitHub secrets and Cloudflare Worker secrets use DIFFERENT naming conventions. Confusing them causes deployment failures.

| Context | Prefix | Example | Where Used |
|---------|--------|---------|------------|
| **GitHub Environment** | `GH_` | `GH_CLIENT_ID` | GitHub Settings -> Environments -> production |
| **Cloudflare Worker** | `GITHUB_` | `GITHUB_CLIENT_ID` | Worker runtime, local `.env` files |
| Context | Prefix | Example | Where Used |
| ---------------------- | --------- | ------------------ | --------------------------------------------- |
| **GitHub Environment** | `GH_` | `GH_CLIENT_ID` | GitHub Settings -> Environments -> production |
| **Cloudflare Worker** | `GITHUB_` | `GITHUB_CLIENT_ID` | Worker runtime, local `.env` files |

### Why Different Names?

GitHub Actions reserves `GITHUB_*` for its own use. So we use `GH_*` in GitHub, and `configure-secrets.sh` maps them to `GITHUB_*` Worker secrets.
GitHub Actions secret names cannot start with `GITHUB_*`. So we use `GH_*` in GitHub, and `configure-secrets.sh` maps them to `GITHUB_*` Worker secrets.

### The Mapping (done by `configure-secrets.sh`)

Expand All @@ -22,6 +22,7 @@ GH_CLIENT_SECRET -> GITHUB_CLIENT_SECRET
GH_APP_ID -> GITHUB_APP_ID
GH_APP_PRIVATE_KEY -> GITHUB_APP_PRIVATE_KEY
GH_APP_SLUG -> GITHUB_APP_SLUG
GH_WEBHOOK_SECRET -> GITHUB_WEBHOOK_SECRET
```

### Documentation Rules
Expand All @@ -37,6 +38,7 @@ GH_APP_SLUG -> GITHUB_APP_SLUG
- **User configuring GitHub**: Tell them to use `GH_CLIENT_ID`
- **Code reading from env**: Use `env.GITHUB_CLIENT_ID`
- **Local development**: Use `GITHUB_CLIENT_ID` in `.env`
- **GitHub webhook secret**: Tell them to use `GH_WEBHOOK_SECRET` in GitHub and `GITHUB_WEBHOOK_SECRET` in Worker/local env

## Wrangler Environment Sections (Generated at Deploy Time)

Expand All @@ -56,6 +58,7 @@ Add the binding to the **top-level section of `wrangler.toml` only**. The sync s
- **Derived bindings** (worker name, routes, tail_consumers): Computed from `DEPLOYMENT_CONFIG` naming conventions.

The CI quality check (`pnpm quality:wrangler-bindings`) verifies:

1. No `[env.*]` sections exist in checked-in `wrangler.toml` files
2. All required binding types are present at the top level

Expand All @@ -81,11 +84,11 @@ Local development uses `.dev.vars`.

When constructing URLs using `BASE_DOMAIN`, you MUST use the correct subdomain prefix. The root domain does NOT serve any application.

| Destination | URL Pattern | Example |
|-------------|-------------|---------|
| **Web UI** | `https://app.${BASE_DOMAIN}/...` | `https://app.simple-agent-manager.org/settings` |
| **API** | `https://api.${BASE_DOMAIN}/...` | `https://api.simple-agent-manager.org/health` |
| **Workspace** | `https://ws-${id}.${BASE_DOMAIN}` | `https://ws-abc123.simple-agent-manager.org` |
| Destination | URL Pattern | Example |
| ------------- | --------------------------------- | ----------------------------------------------- |
| **Web UI** | `https://app.${BASE_DOMAIN}/...` | `https://app.simple-agent-manager.org/settings` |
| **API** | `https://api.${BASE_DOMAIN}/...` | `https://api.simple-agent-manager.org/health` |
| **Workspace** | `https://ws-${id}.${BASE_DOMAIN}` | `https://ws-abc123.simple-agent-manager.org` |

**NEVER** use `https://${BASE_DOMAIN}/...` (bare root domain) for redirects or links.

Expand Down
68 changes: 45 additions & 23 deletions .claude/skills/env-reference/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,33 @@ user-invocable: false

## GitHub Environment Secrets (GitHub Settings -> Environments -> production)

Uses `GH_*` prefix because GitHub Actions reserves `GITHUB_*` for its own use.

| Type | Name | Required |
| -------- | -------------------------- | -------- |
| Variable | `BASE_DOMAIN` | Yes |
| Variable | `RESOURCE_PREFIX` | No (default: `sam`) |
| Variable | `PULUMI_STATE_BUCKET` | No (default: `sam-pulumi-state`) |
| Secret | `CF_API_TOKEN` | Yes |
| Secret | `CF_ACCOUNT_ID` | Yes |
| Secret | `CF_ZONE_ID` | Yes |
| Secret | `R2_ACCESS_KEY_ID` | Yes |
| Secret | `R2_SECRET_ACCESS_KEY` | Yes |
| Secret | `PULUMI_CONFIG_PASSPHRASE` | Yes |
| Secret | `GH_CLIENT_ID` | Yes |
| Secret | `GH_CLIENT_SECRET` | Yes |
| Secret | `GH_APP_ID` | Yes |
| Secret | `GH_APP_PRIVATE_KEY` | Yes |
| Secret | `GH_APP_SLUG` | Yes |
| Secret | `ENCRYPTION_KEY` | No (auto-generated) |
| Secret | `JWT_PRIVATE_KEY` | No (auto-generated) |
| Secret | `JWT_PUBLIC_KEY` | No (auto-generated) |

## GH_ to GITHUB_ Mapping (done by `configure-secrets.sh`)
Uses `GH_*` prefix because GitHub Actions secret names cannot start with `GITHUB_*`.

| Type | Name | Required |
| -------- | -------------------------- | --------------------------------------- |
| Variable | `BASE_DOMAIN` | Yes |
| Variable | `RESOURCE_PREFIX` | No (default: `sam`) |
| Variable | `PULUMI_STATE_BUCKET` | No (default: `sam-pulumi-state`) |
| Secret | `CF_API_TOKEN` | Yes |
| Secret | `CF_ACCOUNT_ID` | Yes |
| Secret | `CF_ZONE_ID` | Yes |
| Secret | `R2_ACCESS_KEY_ID` | Yes |
| Secret | `R2_SECRET_ACCESS_KEY` | Yes |
| Secret | `PULUMI_CONFIG_PASSPHRASE` | Yes |
| Secret | `GH_CLIENT_ID` | Yes |
| Secret | `GH_CLIENT_SECRET` | Yes |
| Secret | `GH_APP_ID` | Yes |
| Secret | `GH_APP_PRIVATE_KEY` | Yes |
| Secret | `GH_APP_SLUG` | Yes |
| Secret | `GH_WEBHOOK_SECRET` | Yes when GitHub App webhooks are active |
| Secret | `ENCRYPTION_KEY` | No (auto-generated) |
| Secret | `JWT_PRIVATE_KEY` | No (auto-generated) |
| Secret | `JWT_PUBLIC_KEY` | No (auto-generated) |
| Secret | `ORIGIN_CA_CERT` | No (auto-generated) |
| Secret | `ORIGIN_CA_KEY` | No (auto-generated) |
| Secret | `TRIAL_CLAIM_TOKEN_SECRET` | No (auto-generated) |

## GH* to GITHUB* Mapping (done by `configure-secrets.sh`)

```
GitHub Secret -> Cloudflare Worker Secret
Expand All @@ -39,28 +43,35 @@ GH_CLIENT_SECRET -> GITHUB_CLIENT_SECRET
GH_APP_ID -> GITHUB_APP_ID
GH_APP_PRIVATE_KEY -> GITHUB_APP_PRIVATE_KEY
GH_APP_SLUG -> GITHUB_APP_SLUG
GH_WEBHOOK_SECRET -> GITHUB_WEBHOOK_SECRET
```

Use `GH_WEBHOOK_SECRET` in GitHub Actions because secret names cannot start with `GITHUB_`. The Worker/runtime secret remains `GITHUB_WEBHOOK_SECRET`, and it must match the GitHub App webhook secret exactly.

## API Worker Runtime Environment Variables

See `apps/api/.env.example` for the full list. Key variables:

### Core

- `WRANGLER_PORT` — Local dev port (default: 8787)
- `BASE_DOMAIN` — Set automatically by sync scripts

### Resource Limits

- `MAX_NODES_PER_USER` — Runtime node cap
- `MAX_AGENT_SESSIONS_PER_WORKSPACE` — Runtime session cap
- `MAX_PROJECTS_PER_USER` — Runtime project cap
- `MAX_TASKS_PER_PROJECT` — Runtime task cap per project
- `MAX_TASK_DEPENDENCIES_PER_TASK` — Runtime dependency-edge cap per task

### Pagination

- `TASK_LIST_DEFAULT_PAGE_SIZE` — Default task/project list page size
- `TASK_LIST_MAX_PAGE_SIZE` — Maximum task/project list page size

### Timeouts

- `TASK_CALLBACK_TIMEOUT_MS` — Timeout budget for delegated-task callback processing
- `TASK_CALLBACK_RETRY_MAX_ATTEMPTS` — Retry budget for delegated-task callback processing
- `NODE_HEARTBEAT_STALE_SECONDS` — Staleness threshold for node health
Expand All @@ -71,19 +82,22 @@ See `apps/api/.env.example` for the full list. Key variables:
- `NODE_AGENT_REQUEST_TIMEOUT_MS` — Timeout for Node Agent HTTP requests (default: 30000)

### Audio/Transcription

- `WHISPER_MODEL_ID` — Workers AI model for transcription (default: `@cf/openai/whisper-large-v3-turbo`)
- `MAX_AUDIO_SIZE_BYTES` — Maximum audio upload size (default: 10485760)
- `MAX_AUDIO_DURATION_SECONDS` — Maximum recording duration (default: 60)
- `RATE_LIMIT_TRANSCRIBE` — Rate limit for transcription requests

### Client Error Reporting

- `RATE_LIMIT_CLIENT_ERRORS` — Rate limit per hour per IP (default: 200)
- `MAX_CLIENT_ERROR_BATCH_SIZE` — Max errors per request (default: 25)
- `MAX_CLIENT_ERROR_BODY_BYTES` — Max request body size (default: 65536)
- `MAX_VM_AGENT_ERROR_BODY_BYTES` — Max VM agent error request body (default: 32768)
- `MAX_VM_AGENT_ERROR_BATCH_SIZE` — Max VM agent errors per request (default: 10)

### Codex OAuth Refresh Proxy (`CodexRefreshLock` DO + `/api/auth/codex-refresh`)

- `CODEX_REFRESH_PROXY_ENABLED` — Kill switch; set to `'false'` to disable the proxy entirely (default: enabled)
- `CODEX_REFRESH_UPSTREAM_URL` — OpenAI OAuth token endpoint (default: `https://auth.openai.com/oauth/token`)
- `CODEX_REFRESH_UPSTREAM_TIMEOUT_MS` — Timeout for upstream fetch (default: 10000)
Expand All @@ -94,6 +108,7 @@ See `apps/api/.env.example` for the full list. Key variables:
- `RATE_LIMIT_CODEX_REFRESH_WINDOW_SECONDS` — Rate-limit window length in seconds (default: 3600)

### Credential Routes Rate Limits

- `RATE_LIMIT_CREDENTIAL_UPDATE` — Applied to both user-scoped (`PUT /api/credentials/agent`) and project-scoped (`PUT /api/projects/:id/credentials`) credential write endpoints (MEDIUM #7 fix)

### Trial Onboarding (`/try` flow)
Expand All @@ -114,28 +129,33 @@ See `docs/guides/trial-configuration.md` for the full table with meanings and de
## VM Agent Environment Variables

### Container/User

- `CONTAINER_USER` — Optional `docker exec -u` override; when unset, auto-detects effective devcontainer user

### Git Operations

- `GIT_EXEC_TIMEOUT` — Timeout for git commands via docker exec (default: 30s)
- `GIT_WORKTREE_TIMEOUT` — Timeout for git worktree create/remove (default: 30s)
- `WORKTREE_CACHE_TTL` — Cache duration for parsed `git worktree list` results (default: 5s)
- `MAX_WORKTREES_PER_WORKSPACE` — Max worktrees allowed per workspace (default: 5)
- `GIT_FILE_MAX_SIZE` — Max file size for git/file endpoint (default: 1048576)

### File Operations

- `FILE_LIST_TIMEOUT` — Timeout for file listing commands (default: 10s)
- `FILE_LIST_MAX_ENTRIES` — Max entries per directory listing (default: 1000)
- `FILE_FIND_TIMEOUT` — Timeout for recursive file index (default: 15s)
- `FILE_FIND_MAX_ENTRIES` — Max entries returned by file index (default: 5000)

### Error Reporting

- `ERROR_REPORT_FLUSH_INTERVAL` — Background error flush interval (default: 30s)
- `ERROR_REPORT_MAX_BATCH_SIZE` — Immediate flush threshold (default: 10)
- `ERROR_REPORT_MAX_QUEUE_SIZE` — Max queued error entries (default: 100)
- `ERROR_REPORT_HTTP_TIMEOUT` — HTTP POST timeout for error reports (default: 10s)

### ACP (Agent Communication Protocol)

- `ACP_MESSAGE_BUFFER_SIZE` — Max buffered messages per SessionHost for late-join replay (default: 5000)
- `ACP_VIEWER_SEND_BUFFER` — Per-viewer send channel buffer size (default: 256)
- `ACP_PING_INTERVAL` — WebSocket ping interval for stale connection detection (default: 30s)
Expand All @@ -147,10 +167,12 @@ See `docs/guides/trial-configuration.md` for the full table with meanings and de
- `ACP_NOTIF_SERIALIZE_TIMEOUT` — Max wait for previous session/update processing before delivering next (default: 5s)

### Events

- `MAX_NODE_EVENTS` — Max node-level events retained in memory (default: 500)
- `MAX_WORKSPACE_EVENTS` — Max workspace-level events retained in memory (default: 500)

### System Info

- `SYSINFO_DOCKER_TIMEOUT` — Timeout for Docker CLI commands during system info collection (default: 10s)
- `SYSINFO_VERSION_TIMEOUT` — Timeout for version-check commands (default: 5s)
- `SYSINFO_CACHE_TTL` — Cache duration for system info results (default: 5s)
Loading
Loading