From f279a050c75eff1f1dd58e2a905957a80b2d8787 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Thu, 11 Jun 2026 18:20:01 +0300 Subject: [PATCH 1/2] docs: add feature pages for remaining 8 undocumented v0.10 features MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the rest of #1272 — covers the 8 user-facing features that had no dedicated docs page after PR #1274 (which covered preset, cross-squad discover, and coordinator-as-agent export). New pages (8): docs/features/tiered-memory.md (hot/cold/wiki memory model) docs/features/reflect.md (in-session learning skill) docs/features/error-recovery.md (failure recovery skill) docs/features/teams-comms.md (Microsoft Teams adapter) docs/features/fleet-dispatch.md (/fleet hybrid dispatch) docs/features/mcp-frontmatter.md (--mcp-frontmatter flag) docs/features/dual-mode-deployment.md (SQUAD_POD_ID, dual-mode capabilities) docs/features/skill-security-scanner.md (markdown-aware skill scanner) Updated (1): docs/features/export-import.md (added --repo / --branch sections) Source verification: - tiered-memory.md ← packages/squad-cli/templates/skills/tiered-memory/SKILL.md - reflect.md ← packages/squad-cli/templates/skills/reflect/SKILL.md - error-recovery.md ← packages/squad-cli/templates/skills/error-recovery/SKILL.md - teams-comms.md ← packages/squad-sdk/src/platform/comms-teams.ts + changeset - fleet-dispatch.md ← packages/squad-cli/src/cli/commands/watch/capabilities/fleet-dispatch.ts - mcp-frontmatter.md ← packages/squad-cli/src/cli-entry.ts:346 flag + init flow - dual-mode-deployment.md ← packages/squad-sdk/src/ralph/capabilities.ts - skill-security-scanner.md ← scripts/security-review.mjs + changeset - export-import.md edits ← packages/squad-cli/src/cli/commands/import.ts:817 Style matches existing feature pages (loop.md, plugins.md, preset.md format). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/features/dual-mode-deployment.md | 132 ++++++++++++++++++ .../content/docs/features/error-recovery.md | 95 +++++++++++++ .../content/docs/features/export-import.md | 41 ++++++ .../content/docs/features/fleet-dispatch.md | 132 ++++++++++++++++++ .../content/docs/features/mcp-frontmatter.md | 102 ++++++++++++++ docs/src/content/docs/features/reflect.md | 91 ++++++++++++ .../docs/features/skill-security-scanner.md | 118 ++++++++++++++++ docs/src/content/docs/features/teams-comms.md | 127 +++++++++++++++++ .../content/docs/features/tiered-memory.md | 98 +++++++++++++ 9 files changed, 936 insertions(+) create mode 100644 docs/src/content/docs/features/dual-mode-deployment.md create mode 100644 docs/src/content/docs/features/error-recovery.md create mode 100644 docs/src/content/docs/features/fleet-dispatch.md create mode 100644 docs/src/content/docs/features/mcp-frontmatter.md create mode 100644 docs/src/content/docs/features/reflect.md create mode 100644 docs/src/content/docs/features/skill-security-scanner.md create mode 100644 docs/src/content/docs/features/teams-comms.md create mode 100644 docs/src/content/docs/features/tiered-memory.md diff --git a/docs/src/content/docs/features/dual-mode-deployment.md b/docs/src/content/docs/features/dual-mode-deployment.md new file mode 100644 index 000000000..ca31f2752 --- /dev/null +++ b/docs/src/content/docs/features/dual-mode-deployment.md @@ -0,0 +1,132 @@ +--- +title: Dual-Mode Deployment — Pod-Aware Capabilities +description: Run Squad in either agent-per-node or squad-per-pod deployment modes with pod-specific machine capability manifests, controlled by SQUAD_POD_ID and SQUAD_DEPLOYMENT_MODE env vars. +--- + +# Dual-Mode Deployment — Pod-Aware Capabilities + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +Dual-mode deployment extends [Capability Routing](/squad/docs/features/capability-routing/) to support both classic single-machine setups and modern containerized/Kubernetes deployments where multiple Squad pods may share an organization's workload — each with potentially different machine capabilities. + +It introduces two environment variables and a pod-specific manifest lookup pattern so the same Squad config can run identically in either deployment shape. + +--- + +## The two deployment modes + +| Mode | What it means | Capability manifest | +|------|---------------|---------------------| +| **`agent-per-node`** (default) | One Squad instance per machine; the machine's capabilities are the squad's capabilities | `.squad/machine-capabilities.json` (shared) | +| **`squad-per-pod`** | Multiple Squad pods may run on different machines/containers, each with potentially different capabilities | `.squad/machine-capabilities-{podId}.json` (pod-specific) with fallback chain | + +Choose the mode via the `SQUAD_DEPLOYMENT_MODE` environment variable: + +```bash +# Classic single-machine setup (default) +export SQUAD_DEPLOYMENT_MODE=agent-per-node + +# Kubernetes / multi-pod setup +export SQUAD_DEPLOYMENT_MODE=squad-per-pod +export SQUAD_POD_ID=worker-1 +``` + +If neither is set, the SDK defaults to `agent-per-node` for backward compatibility. + +--- + +## Environment variables + +### `SQUAD_DEPLOYMENT_MODE` + +| Value | Behavior | +|-------|----------| +| `agent-per-node` | Single shared `machine-capabilities.json` | +| `squad-per-pod` | Pod-specific manifests with fallback chain | +| (unset) | Same as `agent-per-node` | + +### `SQUAD_POD_ID` + +Pod identifier used to construct the pod-specific manifest path. Required when `SQUAD_DEPLOYMENT_MODE=squad-per-pod`; ignored otherwise. + +```bash +SQUAD_POD_ID=worker-1 # → .squad/machine-capabilities-worker-1.json +SQUAD_POD_ID=gpu-pool-node-3 # → .squad/machine-capabilities-gpu-pool-node-3.json +``` + +--- + +## The fallback chain (squad-per-pod mode) + +When `SQUAD_DEPLOYMENT_MODE=squad-per-pod` AND `SQUAD_POD_ID` is set, the SDK looks up capabilities in this order: + +1. **`.squad/machine-capabilities-{podId}.json`** — pod-specific (highest priority) +2. **`.squad/machine-capabilities.json`** — shared fallback for capabilities that apply to all pods +3. **`~/.squad/machine-capabilities.json`** — user-home fallback (rarely useful in container deployments) +4. **`null`** — opt-out; capability routing falls back to label-only routing + +The first manifest that exists is loaded; the search stops there (no merging). If you need different pods to see different capability sets, give each its own pod-specific file. If you need a shared baseline plus pod-specific additions, merge at the deployment-config level (Helm, Kustomize, etc.) — the SDK doesn't merge automatically. + +--- + +## SDK programmatic access + +The new exports from `@bradygaster/squad-sdk/ralph/capabilities`: + +```typescript +import { + getDeploymentMode, + getPodId, + type DeploymentMode, +} from '@bradygaster/squad-sdk/ralph/capabilities'; + +const mode: DeploymentMode = getDeploymentMode(); // 'agent-per-node' | 'squad-per-pod' +const podId: string | undefined = getPodId(); // e.g. 'worker-1', or undefined +``` + +These are pure env-var readers. They don't cache or memoize — each call reads `process.env` directly so changes between reads are visible. + +--- + +## Typical Kubernetes deployment shape + +In a KEDA-scaled deployment (see [KEDA Scaling](/squad/docs/features/keda-scaling/)), each scaled pod gets a unique `SQUAD_POD_ID` from the pod's name or hash: + +```yaml +# Deployment env block +env: + - name: SQUAD_DEPLOYMENT_MODE + value: squad-per-pod + - name: SQUAD_POD_ID + valueFrom: + fieldRef: + fieldPath: metadata.name +``` + +The pod's mounted volume contains per-pod manifests baked in by the image build or pulled from a ConfigMap, e.g.: + +``` +/app/.squad/ +├── machine-capabilities.json # shared baseline (CPU, memory) +├── machine-capabilities-gpu-pool-node-1.json # extends baseline with GPU +├── machine-capabilities-gpu-pool-node-2.json # same shape +└── machine-capabilities-cpu-pool-node-1.json # no GPU declaration +``` + +Pods scheduled onto GPU nodes load a manifest declaring GPU capability; pods on CPU-only nodes get a manifest without GPU. Ralph's issue dispatcher routes `needs:gpu`-labeled work only to pods with the GPU capability. + +--- + +## Limitations + +- **No automatic pod discovery.** The SDK reads env vars to know who it is; it doesn't enumerate sibling pods or coordinate work distribution. That's the deployment orchestrator's job (KEDA, scheduler). +- **No central capability registry.** Pods don't publish their capabilities back to anything; each pod evaluates issues against its own loaded manifest independently. If you need a central view, your orchestrator must aggregate. +- **Manifest changes require redeploy or restart.** The fallback lookup happens on capability resolution; manifest content is read from disk each time but the manifest *path* is decided by env vars set at process start. + +--- + +## See also + +- [Capability Routing](/squad/docs/features/capability-routing/) — the broader machine-capability system +- [KEDA Scaling](/squad/docs/features/keda-scaling/) — autoscaling Squad pods on demand +- [Labels](/squad/docs/features/labels/) — `needs:*` label conventions used for capability matching diff --git a/docs/src/content/docs/features/error-recovery.md b/docs/src/content/docs/features/error-recovery.md new file mode 100644 index 000000000..e9ebea5ce --- /dev/null +++ b/docs/src/content/docs/features/error-recovery.md @@ -0,0 +1,95 @@ +--- +title: Error Recovery — Standard Failure Patterns +description: Built-in skill teaching agents to adapt when things fail — retry with backoff, fallback alternatives, diagnose-and-fix, and escalation patterns. +--- + +# Error Recovery — Standard Failure Patterns + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +The `error-recovery` skill teaches every squad agent to **adapt** when something fails, not just report the failure. It ships as a built-in skill at `.copilot/skills/error-recovery/SKILL.md` and is available to every spawned agent. + +Without this skill, agents tend to encounter a failure (CI test red, API timeout, missing dependency) and stop. With it, they apply standard patterns to diagnose, retry, or escalate the right way. + +--- + +## The five recovery patterns + +### 1. Retry with Backoff + +**When:** Transient failures — API timeouts, rate limits, network errors, temporary service unavailability. + +**Pattern:** +1. Wait briefly, then retry (start at 2s, double each attempt) +2. Maximum 3 retries before escalating +3. Log each attempt with the error received + +**Example:** API call returns `429 Too Many Requests` → wait 2s → retry → wait 4s → retry → wait 8s → retry → escalate if still failing. + +### 2. Fallback Alternatives + +**When:** Primary tool or approach fails and an alternative exists. + +**Pattern:** +1. Attempt primary approach +2. On failure, identify alternative tool/method +3. Try the alternative with the same intent +4. Document which alternative was used and why + +**Example:** Primary CLI tool fails → fall back to direct API call for the same operation. Or: `gh pr comment` rate-limited → fall back to `gh api -X POST .../issues/{n}/comments`. + +### 3. Diagnose-and-Fix + +**When:** Build failures, test failures, linting errors — structured errors with actionable output. + +**Pattern:** +1. Read the full error output carefully (not just the last line) +2. Identify the root cause from error messages +3. Attempt a targeted fix +4. Re-run to verify the fix +5. If 3 fix attempts fail, escalate with a diagnostic summary + +**Example:** TypeScript build fails with `Cannot find module '@x/y'` → check `package.json`, run `npm install`, re-run build. + +### 4. Reframe-and-Retry + +**When:** The approach itself is wrong (not just the implementation). User feedback like *"that won't work because..."* or *"try a different way"*. + +**Pattern:** +1. Stop the current approach immediately +2. Re-read the original task description +3. Identify what assumption was wrong +4. Propose 2 alternative approaches before picking one +5. Get user confirmation if the cost of being wrong again is high + +### 5. Escalation + +**When:** Three attempts have failed, OR the failure is outside the agent's domain, OR fixing it would violate a team decision. + +**Pattern:** +1. Stop attempting fixes +2. Summarize: what was tried, what failed, what's known +3. Surface to coordinator with a clear ask (*"need lead's call on architecture"* vs. *"need human approval"* vs. *"need access to X system"*) +4. Document the escalation in `decisions/inbox/` if it's a recurring pattern + +--- + +## When NOT to apply these patterns + +- **Don't retry on user-input errors.** If the user typed `gh repo create my-typo`, don't retry with `my-typoo`. Surface and ask. +- **Don't fall back silently on security-sensitive operations.** If `git push origin main` fails because of branch protection, do NOT fall back to `--force`. +- **Don't escalate without context.** *"It failed"* isn't an escalation; *"three attempts, each with `EACCES`, suggests user lacks write to `.squad/`, recommend chmod or different storage path"* is. + +--- + +## Integration with Reviewer Rejection Protocol + +When the failure is a Reviewer rejection (a Reviewer agent rejects an artifact), the [Reviewer Rejection Protocol](/squad/docs/features/reviewer-protocol/) takes precedence. The original author is locked out and a different agent must own the revision. Error-recovery patterns apply within that constraint — the revision agent can use retry/fallback/diagnose patterns freely. + +--- + +## See also + +- [Reflect](/squad/docs/features/reflect/) — learning from corrections +- [Reviewer Protocol](/squad/docs/features/reviewer-protocol/) — when a Reviewer rejects work +- [Skills](/squad/docs/features/skills/) — how built-in skills work diff --git a/docs/src/content/docs/features/export-import.md b/docs/src/content/docs/features/export-import.md index 06b5fe586..789cdf316 100644 --- a/docs/src/content/docs/features/export-import.md +++ b/docs/src/content/docs/features/export-import.md @@ -31,6 +31,28 @@ Creates `squad-export.json` in the current directory — a portable snapshot of squad export --out ./backups/my-team.json ``` +### Push directly to a GitHub repository + +Instead of writing to a local file, you can push the export straight to a GitHub repo via the GitHub Contents API. This is the easiest way to back up your team to a private repo or share it with collaborators without sending a file. + +```bash +# Export to a GitHub repo (uses default branch) +squad export --repo myorg/squad-backups + +# Export to a specific branch +squad export --repo myorg/squad-backups --branch nightly +``` + +Requirements: +- GitHub CLI (`gh`) installed and authenticated with permission to push to the target repo +- The repo must exist (the export does NOT create it) + +The export lands at the repo root as `squad-export.json` by default. Combine with `--out` to control the filename inside the repo: + +```bash +squad export --repo myorg/squad-backups --out my-team-2026-06-11.json +``` + ### What's included | Data | Included | @@ -53,6 +75,25 @@ squad import squad-export.json Imports the snapshot into the current repo's `.squad/` directory. +### Pull directly from a GitHub repository + +You can import a snapshot directly from a GitHub repo without downloading the file first: + +```bash +# Import from default branch of a repo +squad import --repo myorg/squad-backups + +# Import a specific filename or branch +squad import --repo myorg/squad-backups --branch nightly +squad import --repo myorg/squad-backups --out my-team-2026-06-11.json +``` + +Requirements: +- GitHub CLI (`gh`) installed and authenticated with read access to the source repo +- The export file must exist at the named path in the repo (default: `squad-export.json` at repo root) + +Use `--force` together with `--repo` for the same archive-then-replace behavior as the file-based import. + ### Collision detection If `.squad/` already exists, Squad warns you and stops. To archive the existing team and replace it: diff --git a/docs/src/content/docs/features/fleet-dispatch.md b/docs/src/content/docs/features/fleet-dispatch.md new file mode 100644 index 000000000..cd197d75d --- /dev/null +++ b/docs/src/content/docs/features/fleet-dispatch.md @@ -0,0 +1,132 @@ +--- +title: Fleet Dispatch — Parallel Issue Triage +description: Hybrid dispatch mode for squad watch that batches read-heavy issues into a single Copilot /fleet session for 2.9x faster parallel analysis. +--- + +# Fleet Dispatch — Parallel Issue Triage + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +**Try this for parallel read-heavy issue triage:** +```bash +squad watch --execute --dispatch-mode fleet +``` + +**Try this for mixed read + write workloads:** +```bash +squad watch --execute --dispatch-mode hybrid +``` + +Fleet Dispatch enables `squad watch --execute` to batch **read-heavy issues** (research, review, audit, triage) into a single Copilot CLI `/fleet` session that analyzes them in parallel tracks. The published measurement: **2.9× faster** than sequential dispatch for read-heavy workloads. + +It's a `WatchCapability` that runs in the `post-execute` phase of the watch loop, so it composes with the existing per-issue dispatch logic rather than replacing it. + +--- + +## Three dispatch modes + +| Mode | What gets parallelized | Best for | +|------|------------------------|----------| +| **`sequential`** (default) | One issue at a time, full agent spawn each | Mixed workloads, debugging | +| **`fleet`** | All issues batched into one `/fleet` Copilot session, parallel analysis tracks | Pure triage/review rounds where all issues are read-only | +| **`hybrid`** | Read-heavy issues go to fleet; write-heavy issues go sequential | Realistic backlogs with both kinds | + +`hybrid` is the recommended mode for most teams — it gets the speedup on the analysis-heavy issues without trying to fleet-dispatch issues that need to write code or modify state. + +--- + +## What counts as "read-heavy" + +The fleet-dispatch capability classifies issues using the same `classifyIssue` logic used elsewhere in `squad watch`. Read-heavy classification is based on labels and title keywords: + +- **Labels:** `triage`, `review`, `audit`, `analyze`, `research`, `investigate`, `discuss`, `question` +- **Title keywords:** *"review"*, *"audit"*, *"analyze"*, *"investigate"*, *"why does"*, *"how does"* + +Anything that touches code, files, or external systems is **write-heavy** and stays in sequential dispatch — even in `hybrid` mode. + +--- + +## How a fleet round works + +When `squad watch` decides to dispatch (work items present, no rate-limit hold), and `dispatchMode` is `fleet` or `hybrid`: + +1. Watch's executor calls `findExecutableIssues` to get the work batch +2. FleetDispatch capability runs in `post-execute` phase +3. Read-heavy issues are filtered out of the sequential queue +4. A multi-track `/fleet` prompt is built — one track per issue +5. Each track names the appropriate agent (from the roster + labels) and instructs it to: + - Read the issue body + - Analyze, assess urgency (P0/P1/P2) + - Recommend next step + - Write findings as an issue comment + - **NOT** create branches or modify files +6. The prompt is sent as a single `copilot --fleet` invocation +7. Copilot runs all tracks in parallel, posts comments per issue, exits +8. Watch logs the fleet dispatch result and continues to its next round + +A typical fleet prompt looks like: + +``` +/fleet Execute these 6 read-only analysis tracks in parallel: + +Track 1 (PAO): Issue #421: Triage user-reported bug in login flow + Read the issue body. Analyze, assess urgency (P0/P1/P2), recommend next step. + Write findings as an issue comment. + Do NOT create branches or modify files. + +Track 2 (FIDO): Issue #428: Review PR #427's test coverage + ... + +Rules: All tracks READ-ONLY. Write findings as issue comments. Run in parallel. +``` + +--- + +## Measurement methodology + +The 2.9× speedup citation comes from comparing 6 read-heavy issues: + +- Sequential mode: 6 separate `copilot --agent {role}` invocations → ~18 minutes total (each ~3 min for cold-start + analysis) +- Fleet mode: 1 `copilot --fleet` invocation with 6 tracks → ~6 minutes total (one cold-start, parallel analysis tracks) + +Speedup is dominated by avoiding 5 cold-starts. It does NOT extend to write-heavy issues because Copilot's `/fleet` doesn't currently support parallel write operations safely (commits would conflict). + +--- + +## Configuration + +Set the dispatch mode in `.squad/watch-config.json`: + +```json +{ + "execute": true, + "dispatchMode": "hybrid", + "interval": 300, + "copilotFlags": "--allow-all-tools --no-color" +} +``` + +Or via CLI flag (overrides config): + +```bash +squad watch --execute --dispatch-mode fleet +squad watch --execute --dispatch-mode hybrid +squad watch --execute --dispatch-mode sequential +``` + +--- + +## Limitations + +- **Read-only only.** Fleet tracks must not modify files or create branches. The capability builds prompts that explicitly forbid this; if your team needs parallel write workflows, sequential dispatch remains the safer choice. +- **One track per issue.** No batching of multiple issues into one track — each issue gets its own analysis context. +- **Track count limit.** Copilot CLI `/fleet` has its own track-count ceiling. For backlogs with >10 read-heavy issues per round, the capability splits across multiple fleet calls. +- **Classification is conservative.** If `classifyIssue` is unsure, it defaults to write-heavy (sequential). Better to lose the speedup than to fleet-dispatch a write-heavy issue accidentally. + +--- + +## See also + +- [Ralph](/squad/docs/features/ralph/) — the watch loop's broader behavior +- [Capability Routing](/squad/docs/features/capability-routing/) — how watch matches work to agents +- [Rate Limiting](/squad/docs/features/rate-limiting/) — cooperative rate limiting (composes with fleet dispatch) diff --git a/docs/src/content/docs/features/mcp-frontmatter.md b/docs/src/content/docs/features/mcp-frontmatter.md new file mode 100644 index 000000000..ae2c41122 --- /dev/null +++ b/docs/src/content/docs/features/mcp-frontmatter.md @@ -0,0 +1,102 @@ +--- +title: MCP Frontmatter — squad init --mcp-frontmatter +description: Write MCP server configuration directly into the Squad agent file's frontmatter instead of .copilot/mcp-config.json, for harnesses that read agent-level MCP config. +--- + +# MCP Frontmatter — `squad init --mcp-frontmatter` + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +**Try this when your agent harness reads frontmatter-level MCP config:** +```bash +squad init --mcp-frontmatter +``` + +By default, `squad init` writes MCP server configuration to two places: +- `.copilot/mcp-config.json` (workspace-level for Copilot CLI) +- `~/.copilot/mcp-config.json` (user-level, ensures `copilot -p` non-interactive mode also sees the MCP — see [#1247](https://github.com/bradygaster/squad/issues/1247)) + +The `--mcp-frontmatter` flag changes this: instead of writing JSON config files, MCP server declarations go directly into the YAML frontmatter of `.github/agents/squad.agent.md` (or `.github/agents/squad.md` if you've exported with [Coordinator-as-Agent Export](/squad/docs/features/coordinator-as-agent-export/)). + +--- + +## When to use it + +| Your setup | Use `--mcp-frontmatter`? | +|------------|-------------------------| +| Standard Copilot CLI users | ❌ No — default config files work fine | +| Custom agent harness that reads MCP from agent frontmatter | ✅ Yes | +| Building or distributing a Squad agent as a self-contained file | ✅ Yes — keeps MCP config inline with the agent | +| Some VS Code extensions / custom IDE plugins that prefer per-agent MCP declarations | ✅ Yes | + +If you're not sure, you don't need this flag. It's specifically for environments where the agent file itself is the source of truth for MCP configuration. + +--- + +## What the output looks like + +Without `--mcp-frontmatter` (default), the agent file frontmatter is: + +```yaml +--- +name: squad +description: Squad coordinator +model: claude-opus-4.5 +tools: ["*"] +--- +``` + +And `.copilot/mcp-config.json` separately contains: + +```json +{ + "mcpServers": { + "squad_state": { + "command": "npx", + "args": ["-y", "@bradygaster/squad-cli@latest", "state-mcp"], + "tools": ["*"] + } + } +} +``` + +With `--mcp-frontmatter`, the MCP server moves into the frontmatter: + +```yaml +--- +name: squad +description: Squad coordinator +model: claude-opus-4.5 +tools: ["*"] +mcpServers: + squad_state: + command: npx + args: ["-y", "@bradygaster/squad-cli@latest", "state-mcp"] + tools: ["*"] +--- +``` + +And the standalone `.copilot/mcp-config.json` is not written (or contains only non-squad servers). + +--- + +## Effect on `squad upgrade` + +`squad upgrade` detects which mode the project is using (looks for the `mcpServers` key in agent frontmatter vs. presence of `.copilot/mcp-config.json`) and preserves the choice. You don't need to re-pass `--mcp-frontmatter` on every upgrade. + +To switch modes after init, re-run `squad init --mcp-frontmatter` (or run `squad init` without the flag to switch back). The previous MCP config is migrated. + +--- + +## Limitations + +- **Less robust for `copilot -p` non-interactive mode.** Standard mode pins MCP at user level too, which solves the workspace-only loading gap (PR [#1251](https://github.com/bradygaster/squad/pull/1251)). Frontmatter mode skips that user-level write — so `copilot -p` may not see the squad MCP unless the harness reads frontmatter directly. +- **No second-layer fallback.** If the harness that reads frontmatter MCP fails to load it correctly, there's no `.copilot/mcp-config.json` to fall back to. Test in your specific harness before adopting. +- **Schema is harness-specific.** The frontmatter `mcpServers` key follows the Copilot CLI convention, but other harnesses may expect different key names (`mcp_servers`, `mcp.servers`, etc.). Check your harness's spec. + +--- + +## See also + +- [MCP Integration](/squad/docs/features/mcp/) — the broader MCP system +- [Coordinator-as-Agent Export](/squad/docs/features/coordinator-as-agent-export/) — bundling MCP config into a self-contained agent file diff --git a/docs/src/content/docs/features/reflect.md b/docs/src/content/docs/features/reflect.md new file mode 100644 index 000000000..a23c2e9bc --- /dev/null +++ b/docs/src/content/docs/features/reflect.md @@ -0,0 +1,91 @@ +--- +title: Reflect — In-Session Learning Capture +description: Built-in skill that extracts HIGH/MED/LOW confidence patterns from conversations to prevent repeating mistakes and reinforce successful patterns. +--- + +# Reflect — In-Session Learning Capture + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +The `reflect` skill is a built-in capability that turns every user correction into a learning opportunity. Agents invoke `reflect` after critical conversation moments — corrections, praise, edge-case discoveries — to capture patterns that prevent repeating mistakes across sessions. + +It ships at `.copilot/skills/reflect/SKILL.md` and is automatically available to every spawned agent. The skill complements the existing knowledge layers (`history.md`, `decisions.md`) by capturing **in-flight** learnings that may later graduate to permanent memory. + +--- + +## How it fits the memory architecture + +Squad has three layers for what agents know: + +| Layer | Lifetime | Audience | Reflect's relationship | +|-------|----------|----------|------------------------| +| `.squad/agents/{name}/history.md` | Permanent | Owner agent + Scribe-propagated cross-updates | Reflect captures candidates; HIGH-confidence ones graduate here | +| `.squad/decisions.md` | Permanent | All agents | Reflect surfaces candidates; lead promotes after review | +| `reflect` skill | In-flight | Captured during the active session | Working memory for patterns not yet ready to commit | + +Workflow: +1. During the session, agents invoke `reflect` to capture learnings +2. At session end, the agent or Scribe reviews captured learnings +3. HIGH-confidence patterns → lead reviews for `decisions.md` promotion +4. Agent-specific patterns → `{agent}/history.md` append + +--- + +## Triggers — when to invoke reflect + +### 🔴 HIGH Priority (invoke immediately) + +| Trigger | Example phrase | Why critical | +|---------|---------------|--------------| +| User correction | *"no"*, *"wrong"*, *"not like that"*, *"never do"* | Captures mistakes to prevent repetition | +| Architectural insight | *"you removed that without understanding why"* | Documents the *why* behind a design (Chesterton's Fence) | +| Immediate fixes | *"debug"*, *"root cause"*, *"fix all"* | Learns from errors in real-time | + +### 🟡 MEDIUM Priority (invoke after multiple instances) + +| Trigger | Example phrase | Why important | +|---------|---------------|--------------| +| User praise | *"perfect"*, *"exactly"*, *"great"* | Reinforces successful patterns | +| Tool preferences | *"use X instead of Y"*, *"prefer"* | Builds workflow preferences | +| Edge cases | *"what if X happens?"*, *"don't forget"*, *"ensure"* | Captures scenarios to handle | + +### 🟢 LOW Priority (invoke at natural breakpoints) + +| Trigger | Example phrase | Why useful | +|---------|---------------|--------------| +| Workflow refinements | *"better if you..."*, *"next time"* | Iterative improvement | +| Style preferences | *"prefer this format"*, *"like this approach"* | Personal style learning | + +--- + +## Capture format + +Reflect produces structured entries the lead or Scribe can review at session end: + +```markdown +## Reflection — 2026-06-11T16:42:00Z + +**Trigger:** User correction — "no, never auto-merge without explicit approval" +**Confidence:** HIGH +**Pattern:** Auto-merge gating +**Learning:** Even when CI is green and reviews pass, do not invoke `gh pr merge` without an explicit user confirmation. The user wants the final merge action to be human-driven. +**Promote to:** `decisions.md` (team-wide rule) — surface to lead next ceremony +**Cited:** Coordinator session 2026-06-11, user message ~16:41 +``` + +--- + +## Anti-patterns + +- **Don't capture every interaction.** Reflect is for inflection points — corrections, surprises, breakthroughs. A capture rate >1 per ~10 messages is too high. +- **Don't promote LOW-confidence patterns to decisions.md.** Decisions are binding for the whole team; LOW captures are personal preferences and should live in the agent's `history.md` if anywhere. +- **Don't reflect on user instructions you already executed correctly.** That's not learning, that's logging. +- **Don't paraphrase the user's words when capturing HIGH-priority items.** Verbatim quotes preserve nuance. + +--- + +## See also + +- [Memory & Knowledge](/squad/docs/concepts/memory-and-knowledge/) — the three-layer model +- [Directives](/squad/docs/features/directives/) — how the coordinator captures explicit team rules +- [Error Recovery](/squad/docs/features/error-recovery/) — the companion skill for handling failures diff --git a/docs/src/content/docs/features/skill-security-scanner.md b/docs/src/content/docs/features/skill-security-scanner.md new file mode 100644 index 000000000..43d5dd70e --- /dev/null +++ b/docs/src/content/docs/features/skill-security-scanner.md @@ -0,0 +1,118 @@ +--- +title: Skill Security Scanner +description: Markdown-aware security scanner that catches embedded credentials, download-and-execute patterns, and privilege escalation in skill files before they ship. +--- + +# Skill Security Scanner + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +The skill security scanner is a markdown-aware safety check that runs as part of `scripts/security-review.mjs` to inspect every SKILL.md file in `.copilot/skills/` and `.squad/skills/`. It catches three classes of problem before a skill gets installed or merged: + +1. **Embedded credentials** — API keys, tokens, passwords pasted into skill text +2. **Download-and-execute patterns** — `curl ... | bash`, `Invoke-Expression`, and friends +3. **Privilege escalation commands** — `sudo`, `Set-ExecutionPolicy Bypass`, `chmod 777`, etc. + +It ships as Phase 1 — focused on the highest-signal issues with **zero false positives on the existing 35 skill files** at the time of release. + +--- + +## How it integrates + +The scanner is invoked by the existing security-review pipeline (`scripts/security-review.mjs`), which is triggered: + +- On every PR that touches `.copilot/skills/**` or `.squad/skills/**` (via the Security Review CI workflow) +- Manually: `node scripts/security-review.mjs --scope skills` +- As part of [Plugin Marketplace](/squad/docs/features/plugins/) install (skills from external sources get scanned before landing on disk) + +A finding produces a CI failure with the file path, line number, pattern type, and the matched substring (redacted for credentials). + +--- + +## What it catches + +### Credentials + +| Pattern type | Example match | +|--------------|---------------| +| Generic API key | `API_KEY=` | +| GitHub PAT | `ghp_<40-character-token>` | +| AWS access key | `AKIA<16-character-key>` | +| Bearer tokens | `Authorization: Bearer ` | +| Database connection strings with embedded passwords | `postgres://user:@host/db` | + +### Download-and-execute patterns + +| Pattern type | Example match | +|--------------|---------------| +| Curl-to-bash | `curl https://... \| bash`, `curl ... \| sh`, `wget ... \| sh` | +| PowerShell invoke-expression | `iex (irm https://...)`, `Invoke-Expression $downloaded` | +| Unsafe eval | `eval $(curl ...)`, `eval $(wget ...)` | + +### Privilege escalation + +| Pattern type | Example match | +|--------------|---------------| +| `sudo` invocations | `sudo apt install`, `sudo -i`, `sudo bash` | +| Permissive chmod | `chmod 777`, `chmod a+rwx`, `chmod -R 777` | +| PowerShell policy bypass | `Set-ExecutionPolicy Bypass`, `Set-ExecutionPolicy Unrestricted` | +| Windows admin escalation | `Start-Process ... -Verb RunAs`, `runas /user:Administrator` | + +--- + +## Suppression — the false-positive guardrails + +The scanner is markdown-aware, which means it understands when a "dangerous" pattern is actually in a code block being **shown as an anti-pattern** vs. in prose advising users to run something: + +| Where pattern appears | Action | +|----------------------|--------| +| Inside a fenced code block (```` ``` ````) | **Suppressed** — treated as documentation, not advice | +| Inside an inline code span (`` ` ``) | **Suppressed** — treated as a reference | +| In prose with a placeholder token (``, ``, `xxx`, `***`) | **Suppressed** — clearly an example | +| In prose without any of the above | **Flagged** as a finding | + +The placeholder-token list covers common safe markers: ``, ``, ``, `xxx`, `***`, `placeholder`, `example`, `PLACEHOLDER`. + +This is why the existing 35 skill files have zero false positives — most discuss security patterns inside fenced code blocks or with placeholder tokens. + +--- + +## Local invocation + +```bash +# Scan all skills in the current repo +node scripts/security-review.mjs --scope skills + +# Scan a single skill file +node scripts/security-review.mjs --file .copilot/skills/my-skill/SKILL.md + +# JSON output for tooling integration +node scripts/security-review.mjs --scope skills --format json +``` + +Exit codes: +- `0` — no findings +- `1` — findings detected (CI fails the build) +- `2` — scanner error (couldn't read file, malformed markdown, etc.) + +--- + +## What it doesn't catch + +This is **Phase 1**. The scanner is deliberately conservative — it would rather miss something than false-positive a legitimate skill. Things NOT in scope today: + +- **Obfuscated patterns** — base64-encoded credentials, character-class regex tricks, etc. +- **Multi-line patterns** — the scanner is line-oriented; a credential split across lines won't match +- **Skill scripts (`.js`/`.mjs` files in `scripts/`)** — only the SKILL.md narrative is scanned; executable handlers need their own audit +- **Semantic context** — the scanner doesn't understand whether a `sudo` example is contextually safe; if it's in prose without a placeholder marker, it flags +- **Hooks beyond `.copilot/skills/` and `.squad/skills/`** — other markdown files (charters, decisions, README) aren't scanned by this rule + +Phase 2 work tracked in the issue tracker would extend coverage to scripts and add an LLM-based semantic pass. + +--- + +## See also + +- [Skills](/squad/docs/features/skills/) — the broader skills system +- [Plugin Marketplace](/squad/docs/features/plugins/) — how external skills get installed +- [Secret Handling](/squad/docs/features/skills/) — see also the `secret-handling` built-in skill diff --git a/docs/src/content/docs/features/teams-comms.md b/docs/src/content/docs/features/teams-comms.md new file mode 100644 index 000000000..533cefddb --- /dev/null +++ b/docs/src/content/docs/features/teams-comms.md @@ -0,0 +1,127 @@ +--- +title: Microsoft Teams Comms Adapter +description: Bidirectional chat integration between Squad and Microsoft Teams via Microsoft Graph API — 1:1 chats and channel messaging with PKCE browser auth or device code fallback. +--- + +# Microsoft Teams Comms Adapter + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +The Teams adapter lets your squad post updates and read replies through Microsoft Teams, alongside the existing file-based, email, and other comm channels. It ships in `@bradygaster/squad-sdk` as a `CommunicationAdapter` implementation and uses Microsoft Graph API for both 1:1 chats and channel messaging. + +> **⚠️ Breaking change in v0.10:** `createCommunicationAdapter` is now async (returns `Promise`). Callers must `await` the result. + +--- + +## What you can do with it + +| Action | Supported | +|--------|-----------| +| Post a message to a 1:1 chat | ✅ | +| Post a message to a Teams channel | ✅ | +| Read replies / new messages from a chat | ✅ | +| Post rich content (Adaptive Cards, attachments) | Partial (text + basic formatting) | +| Notify on agent-completed work | ✅ (via squad watch / notification routing) | +| Two-way conversation with an agent in Teams | ✅ (poll-based, not push) | + +The adapter is one of several `CommunicationAdapter` implementations — see [Notifications](/squad/docs/features/notifications/) for the broader notification system. + +--- + +## Authentication flow + +The adapter tries auth methods in this order, falling through on failure: + +1. **Cached token** — looks for a previously-saved token in the OS credential store +2. **Refresh token** — if cached refresh token is valid, silently re-issues an access token +3. **Browser PKCE** — opens a browser for the user to sign in; uses Authorization Code with PKCE; 120-second timeout +4. **Device code** — fallback when no browser is available (CI, remote shell); user enters a code on a different device + +``` +$ squad notify teams --to user@example.com --message "Build complete" +🔑 No cached token — opening browser for sign-in... +[browser opens, user signs in] +✓ Token cached. Sending message... +✓ Posted to user@example.com +``` + +The token cache persists across sessions. After the first sign-in, subsequent runs are silent unless the refresh token expires. + +--- + +## Configuration + +The adapter requires a Microsoft Entra (Azure AD) app registration with permissions for: + +- `Chat.ReadWrite` (1:1 chat operations) +- `ChannelMessage.Send` (channel posts) +- `ChannelMessage.Read.All` (read channel replies) +- `User.Read` (basic profile) + +Configure in `.squad/config.json`: + +```json +{ + "comms": { + "teams": { + "tenantId": "00000000-0000-0000-0000-000000000000", + "clientId": "00000000-0000-0000-0000-000000000000", + "redirectUri": "http://localhost:8400/auth", + "tokenCachePath": "~/.squad/.cache/teams-token.json" + } + } +} +``` + +The `redirectUri` is the local-only OAuth callback for browser PKCE — it never leaves your machine. + +--- + +## Usage from the SDK + +```typescript +import { createCommunicationAdapter } from '@bradygaster/squad-sdk/platform'; + +// IMPORTANT: this is async now (breaking change in v0.10) +const teams = await createCommunicationAdapter({ channel: 'teams' }); + +// Post a message +const post = await teams.postUpdate({ + title: 'CI passed', + body: 'PR #1234 is green and ready for review.', + category: 'pr-status', + author: 'Squad', +}); + +// Poll for replies +const replies = await teams.pollForReplies({ + threadId: post.id, + since: new Date(Date.now() - 60_000), +}); +``` + +--- + +## Limitations + +- **Polling, not push.** The adapter polls for replies; it doesn't subscribe to a websocket. Reply latency is the poll interval (default 30s). +- **No Adaptive Card builder.** You can send plain text and basic formatting today; for rich cards, use the underlying Graph SDK directly. +- **No bot-framework integration.** This adapter uses delegated user permissions, not a bot account. Each user sees the message as posted by themselves (or the configured app identity), not by a "Squad bot". +- **MSAL token cache shared across processes.** If you run multiple squads simultaneously with the same Entra app, they share the same cached token. Use distinct `tokenCachePath` if you need isolation. + +--- + +## Security notes + +- Tokens are stored in the OS credential store (Windows Credential Manager / macOS Keychain / Linux libsecret) where available, with a JSON file fallback at `tokenCachePath` +- The browser PKCE callback listens on `127.0.0.1` only — never exposed to the network +- The device code flow shows a verification URL + code; both are short-lived +- The adapter does NOT log message content; only metadata (post id, recipient, timestamp) is recorded in any audit trail + +--- + +## See also + +- [Notifications](/squad/docs/features/notifications/) — the broader notification system +- [Enterprise Platforms](/squad/docs/features/enterprise-platforms/) — Teams + ADO + other enterprise integrations +- [Notification Level](/squad/docs/features/notification-level/) — controlling noise across all channels diff --git a/docs/src/content/docs/features/tiered-memory.md b/docs/src/content/docs/features/tiered-memory.md new file mode 100644 index 000000000..acdfd6a61 --- /dev/null +++ b/docs/src/content/docs/features/tiered-memory.md @@ -0,0 +1,98 @@ +--- +title: Tiered Memory — Hot / Cold / Wiki +description: Three-tier agent memory model that cuts spawn context cost by 20-55% by separating fresh task context from archived history and durable reference docs. +--- + +# Tiered Memory — Hot / Cold / Wiki + +> ⚠️ **Experimental** — Squad is alpha software. APIs, commands, and behavior may change between releases. + +**Problem:** Squad agents load their full `history.md` on every spawn. Production measurements show 34–74KB payloads per agent (8.8K–18.5K tokens), with 82–96% of that being "old noise" — context the current task doesn't need. + +**Solution:** A three-tier memory model that loads only what each task actually requires, achieving 20–55% context reduction per spawn. + +Tiered Memory ships as a built-in skill at `.copilot/skills/tiered-memory/SKILL.md` and pairs with Scribe's existing 15KB-summarization rule (see [Memory & Knowledge](/squad/docs/concepts/memory-and-knowledge/)) to give large, long-running squads predictable context budgets. + +--- + +## The three tiers + +### 🔥 Hot — Current Session Context + +- **Size target:** ~2–4KB +- **Loaded:** Always, on every spawn +- **Contents:** Current task, active decisions made this session, immediate blockers, last 3–5 actions, who's being talked to +- **Lifetime:** Current session only — Scribe promotes relevant parts to Cold at session end +- **Purpose:** Immediate task context with zero latency and zero decision + +### ❄️ Cold — Summarized Cross-Session History + +- **Size target:** ~8–12KB +- **Loaded:** On demand — include only when the task explicitly needs history +- **Contents:** Summarized past sessions, cross-session decisions, recurring patterns, unresolved issues +- **Lifetime:** 30-day rolling window — older entries promoted to Wiki +- **Purpose:** Answer *"what have we tried before?"* and *"what was decided?"* without replaying full transcripts +- **How to include:** Pass `--include-cold` in the spawn template, or add a `## Cold Memory` section to the agent's instructions + +### 📚 Wiki — Durable Structured Knowledge + +- **Size target:** variable (structured reference docs) +- **Loaded:** Async write, selective read — only when the task requires domain knowledge +- **Contents:** ADRs, agent charters, routing rules, stable conventions, external API contracts, platform constraints +- **Lifetime:** Permanent until explicitly deprecated +- **Purpose:** Authoritative reference (not history) — structured facts +- **How to include:** Pass `--include-wiki` or reference specific wiki doc paths in the spawn template + +--- + +## When to load each tier + +| Situation | Hot | Cold | Wiki | +|-----------|-----|------|------| +| New task, no prior context needed | ✅ | ❌ | ❌ | +| Resuming interrupted work | ✅ | ✅ | ❌ | +| Debugging a recurring issue | ✅ | ✅ | ❌ | +| Designing something new in an established area | ✅ | ❌ | ✅ | +| Onboarding a new team member | ✅ | ❌ | ✅ | +| Investigating an architectural drift | ✅ | ✅ | ✅ | + +The bias is to load LESS, not more. Cold and Wiki should be opt-in for each spawn based on whether the task description references the past or domain conventions. + +--- + +## How Scribe maintains the tiers + +Scribe's existing maintenance cycle (see [Memory & Knowledge](/squad/docs/concepts/memory-and-knowledge/)) is extended: + +1. **Hot drained at session end** — Scribe scans the session's hot memory, summarizes meaningful entries, appends them to Cold +2. **Cold aged into Wiki** — entries older than 30 days that contain structured facts (decisions, conventions, contracts) get promoted to Wiki +3. **Wiki authored deliberately** — Scribe never auto-creates Wiki entries from scratch; it only promotes Cold content that's already structured + +--- + +## Production measurements + +The skill's documentation cites measurements from a large production squad: + +| Squad size | Before tiered | After tiered | Reduction | +|------------|--------------|--------------|-----------| +| 8 agents, 34KB total history | 8,800 tokens/spawn | 4,400 tokens/spawn | ~50% | +| 14 agents, 74KB total history | 18,500 tokens/spawn | 8,300 tokens/spawn | ~55% | + +The exact savings depend on what fraction of each agent's history is task-relevant. The 20–55% range is the measured spread across different team configurations. + +--- + +## Caveats + +- **The tier split is currently advisory** — the skill defines hot/cold/wiki semantics, but the spawn template doesn't yet enforce `--include-cold` / `--include-wiki` flags as part of the runtime contract. Adoption is per-team via spawn-template edits. +- **Wiki has no UI** — there's no `squad wiki list` command yet. Entries live as files in `.squad/wiki/` (when teams create that directory) and the coordinator references them by path. +- **Issue [#1268](https://github.com/bradygaster/squad/issues/1268) and [#1269](https://github.com/bradygaster/squad/issues/1269)** propose making Scribe enforce these tiers via the governed memory pipeline. Until those land, tier maintenance is best-effort. + +--- + +## See also + +- [Memory & Knowledge](/squad/docs/concepts/memory-and-knowledge/) — the broader memory architecture +- [Skills](/squad/docs/features/skills/) — how built-in skills work +- [Context Hygiene](/squad/docs/features/context-hygiene/) — related practices for keeping spawn context small From 986a777a8b71ef303647db7d30288529b4ab167c Mon Sep 17 00:00:00 2001 From: Tamir Dresher Date: Sun, 14 Jun 2026 07:04:59 +0300 Subject: [PATCH 2/2] fix(cspell): whitelist PKCE/MSAL/AKIA/runas + intentional 'typoo' example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI failure on docs-quality: cspell flagged 9 words across 3 of the new feature pages added by this PR. All 9 are legitimate technical terms or intentional content (not typos): - PKCE (RFC 7636 — OAuth Proof Key for Code Exchange) — 5 hits in teams-comms.md - MSAL (Microsoft Authentication Library) — 1 hit in teams-comms.md - AKIA (AWS access key ID prefix) — 1 hit in skill-security-scanner.md - runas (Windows runas elevation primitive) — 1 hit in skill-security-scanner.md - 'typoo' — an INTENTIONAL typo in an error-recovery example showing 'don't retry the user's typo by adding another character' Added all 5 to cspell.json. Verified locally: npx cspell --no-progress --dot 'docs/src/content/**/*.md' 'README.md' → 0 issues across 166 files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- cspell.json | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cspell.json b/cspell.json index 1030c0872..2f1fc0a0c 100644 --- a/cspell.json +++ b/cspell.json @@ -39,7 +39,9 @@ "benleane", "TELMU", "Automator", "kedacore", "DEVBOX", "myaccount", "graphify", "Graphify", "graphifyy", "safishamsi", "jagilber", "benchmarkfn", "filterissuesbystream", "filterissuesbyworkstream", "Recognised", "recognised", - "squadified", "TOCTOU", "unflushed", "unparseable", "pluggability" + "squadified", "TOCTOU", "unflushed", "unparseable", "pluggability", + "PKCE", "MSAL", "AKIA", "runas", + "typoo" ], "dictionaries": ["en_US", "typescript", "node", "npm", "bash"], "allowCompoundWords": true