Plugin vs CLI agent gaps: observability, prompt enforcement, playbook injection

## Context

The factory supports two execution paths for specialist agents:

- **CLI mode** — `factory agent <role> --task "..." --project /path` → spawns `claude -p` subprocess via `ClaudeRunner.headless()`
- **Plugin mode** — agents installed as Claude Code subagents (`factory:<role>`) via `generate_agent_content()`, invoked through the Agent tool or `/factory:pipeline-subagents` skill

Both are useful: CLI mode is the production path (used by the CEO), plugin mode enables composable pipelines and native parallel execution. But there are gaps between them that should be understood and selectively closed.

## Comparison Table

| Dimension | CLI (`factory agent <role>`) | Plugin (`factory:<role>` subagent) | Gap | Plan |
|---|---|---|---|---|
| **Prompt delivery** | `claude -p "<prompt+task>"` — pipe mode, treated as user message. CEO interactive mode uses `--append-system-prompt` for stronger enforcement. | Frontmatter (`name/description/model/tools`) + prompt body. Loaded as agent definition by Claude Code. | CLI headless uses `-p` (user message), not system prompt. Only CEO interactive gets `--append-system-prompt`. Plugin prompt enforcement depends on Claude Code's agent loading. | **Accept** — `-p` is how all headless agents work. The CEO interactive path correctly uses `--append-system-prompt`. Plugin enforcement is Claude Code's responsibility. |
| **Prompt resolution** | Two-tier: project override (`.factory/agents/<role>.md`) → factory default (`factory/agents/prompts/<role>.md`). | One-tier: factory default only. Static file baked at `sync_agents.py` time. | Plugin agents don't pick up project-specific prompt overrides. | **Accept for now** — plugin agents are meant to be general-purpose. Project-specific overrides are a CEO/CLI concern. |
| **Playbook injection** | Automatic — factory-default playbooks from `factory/agents/playbooks/<role>.md` (and user-evolved from `~/.factory/playbooks/<role>.md`) injected by `inject_playbook()` at invocation time. | None — prompt is frozen at generation time. | Plugin agents miss behavioral playbook rules that CLI agents get. | **Fix** — `generate_agent_content()` should call `load_playbook(role)` and `inject_playbook()` when generating plugin agent files. This way `sync_agents.py` bakes in the current playbooks. Regeneration picks up evolved playbooks. |
| **Tool access** | All tools available (unrestricted `claude -p` session). | Role-specific whitelist in frontmatter (e.g., strategist gets `[Bash, Read, Grep, Glob]`). | CLI agents have broader tool access than necessary. Plugin is more principled (least-privilege). | **Accept** — plugin's tool restriction is actually better. CLI could adopt tool restrictions via `--allowedTools` if Claude Code supports it, but not urgent. |
| **Permissions** | `--dangerously-skip-permissions` — full bypass, no user prompts. | Claude Code's normal permission system — user may be prompted. | Different trust models. | **Accept** — CLI runs in automated/headless context where permission prompts would block. Plugin runs in interactive context where user oversight is appropriate. |
| **Model selection** | Runtime override via `--model` flag or `FACTORY_MODEL` env var. Flexible per-invocation. | Hardcoded in frontmatter per role (`sonnet`/`opus`). No runtime override. | Plugin agents can't adapt model at runtime. | **Fix** — omit `model` field from generated frontmatter so subagents inherit the parent session's model. Keep `agents.yml` model as a default/reference but don't emit it in `generate_agent_content()`. |
| **Working directory** | Explicit `cwd=project_path` passed to subprocess. Isolated. | Inherits Claude Code's CWD. Must be told project path via prompt. | Plugin agents may operate in wrong directory if CWD differs from target project. | **Accept** — this is inherent to Claude Code's subagent model. The pipeline skill already passes project context via the prompt. |
| **Event capture** | `agent.started/completed/failed` → `.factory/events.jsonl`. Full observability. | None — no event emission. | Plugin path is invisible to factory telemetry. Dashboard, ACE, and insights can't see plugin agent activity. | **Fix** — add a `factory emit` CLI subcommand (thin wrapper around `emit_event()`) that pipeline skills can call before/after each agent step. E.g., `factory emit agent.started --agent researcher --project /path`. |
| **Review persistence** | Output saved to `.factory/reviews/<role>-latest.md` with timestamp + exit code header. | Output returned to caller via Agent tool result. Not persisted to disk. | CEO gate decisions rely on `.factory/reviews/` files. Plugin pipeline can't use the same review pattern. | **Fix** — pipeline skills should write agent output to `.factory/reviews/<role>-latest.md` after each step (via Write tool or `tee`). This makes both paths produce the same artifacts. |
| **Orchestration flexibility** | Sequential only (`factory agent` is blocking). Parallel requires multiple bash tool calls from the caller. | Native parallel (multiple Agent calls in one message) + `run_in_background: true`. | CLI can't natively run agents in parallel. | **Accept** — this is a plugin advantage. CLI parallel is possible via `invoke_agents_parallel()` in Python but not from the CEO prompt. |
| **Pipeline composition** | Hardcoded in CEO prompt (fixed modes: Improve, Build, Discover...). | `/factory:pipeline-subagents` designs pipelines on the fly from any goal. | CLI path requires CEO's fixed pipeline structure. | **Accept** — this is the raison d'être for the pipeline skills. Different tools for different needs. |
| **Crash recovery** | Checkpoint file (`.factory/checkpoint.json`) + resume context. CEO can resume after crashes. | None — if parent session dies, pipeline state is lost. | Plugin pipelines can't resume. | **Defer** — could add checkpoint support to pipeline skills later, but low priority since plugin pipelines are typically shorter than full CEO cycles. |
| **Cost control** | Bob Shell ceiling enforcement, invocation logging to `bob_usage.jsonl`. | No cost tracking or ceiling enforcement. | Plugin path has no spending guardrails. | **Defer** — plugin agents use the parent session's token budget. Bob Shell isn't used in plugin mode. |
| **Consecutive failure handling** | `ConsecutiveAgentFailureError` after 2 failures — aborts cycle automatically. | No built-in failure tracking across steps. | Plugin pipelines don't auto-abort on repeated failures. | **Accept** — pipeline skill prompt includes "2 consecutive failures: ABORT pipeline" as a soft instruction. Hard enforcement would require Python changes. |

## Fixes to implement (priority order)

### 1. Event capture — add `factory emit` CLI subcommand
Add a thin CLI command that pipeline skills can call to emit events:
```bash
factory emit agent.started --agent researcher --project "$(pwd)"
factory emit agent.completed --agent researcher --project "$(pwd)"
```
This wraps `emit_event()` from `factory/events.py`. Pipeline skills would call this before/after each agent step.

### 2. Playbook injection in plugin generation
Update `generate_agent_content()` in `factory/agents/plugin.py` to call `load_playbook(role)` and `inject_playbook()` when generating files. This bakes current playbooks into the generated agent files. Running `sync_agents.py` after ACE evolution picks up the latest playbooks.

### 3. Model flexibility in plugin agents  
Omit the `model` field from generated frontmatter so subagents inherit the parent session's model. The `agents.yml` `model` field remains as documentation/reference but isn't emitted in the generated file.

### 4. Review persistence in pipeline skills
Update pipeline skill prompts to write agent output to `.factory/reviews/<role>-latest.md` after each step, matching the CLI path's artifact structure.

## Gaps accepted (with rationale)

- **Prompt delivery** — `-p` vs `--append-system-prompt` is inherent to headless vs interactive mode
- **Prompt resolution** — project overrides are a CEO concern, not a plugin concern
- **Tool access** — plugin's least-privilege model is actually better
- **Permissions** — different trust models for different contexts
- **Working directory** — inherent to Claude Code's subagent model
- **Orchestration flexibility** — plugin's advantage by design
- **Pipeline composition** — different tools for different needs
- **Crash recovery** — deferred, low priority for shorter plugin pipelines
- **Cost control** — plugin uses parent session budget
- **Failure handling** — soft instruction in pipeline prompt is sufficient

Dimension	CLI (`factory agent <role>`)	Plugin (`factory:<role>` subagent)	Gap	Plan
Prompt delivery	`claude -p "<prompt+task>"` — pipe mode, treated as user message. CEO interactive mode uses `--append-system-prompt` for stronger enforcement.	Frontmatter (`name/description/model/tools`) + prompt body. Loaded as agent definition by Claude Code.	CLI headless uses `-p` (user message), not system prompt. Only CEO interactive gets `--append-system-prompt`. Plugin prompt enforcement depends on Claude Code's agent loading.	Accept — `-p` is how all headless agents work. The CEO interactive path correctly uses `--append-system-prompt`. Plugin enforcement is Claude Code's responsibility.
Prompt resolution	Two-tier: project override (`.factory/agents/<role>.md`) → factory default (`factory/agents/prompts/<role>.md`).	One-tier: factory default only. Static file baked at `sync_agents.py` time.	Plugin agents don't pick up project-specific prompt overrides.	Accept for now — plugin agents are meant to be general-purpose. Project-specific overrides are a CEO/CLI concern.
Playbook injection	Automatic — factory-default playbooks from `factory/agents/playbooks/<role>.md` (and user-evolved from `~/.factory/playbooks/<role>.md`) injected by `inject_playbook()` at invocation time.	None — prompt is frozen at generation time.	Plugin agents miss behavioral playbook rules that CLI agents get.	Fix — `generate_agent_content()` should call `load_playbook(role)` and `inject_playbook()` when generating plugin agent files. This way `sync_agents.py` bakes in the current playbooks. Regeneration picks up evolved playbooks.
Tool access	All tools available (unrestricted `claude -p` session).	Role-specific whitelist in frontmatter (e.g., strategist gets `[Bash, Read, Grep, Glob]`).	CLI agents have broader tool access than necessary. Plugin is more principled (least-privilege).	Accept — plugin's tool restriction is actually better. CLI could adopt tool restrictions via `--allowedTools` if Claude Code supports it, but not urgent.
Permissions	`--dangerously-skip-permissions` — full bypass, no user prompts.	Claude Code's normal permission system — user may be prompted.	Different trust models.	Accept — CLI runs in automated/headless context where permission prompts would block. Plugin runs in interactive context where user oversight is appropriate.
Model selection	Runtime override via `--model` flag or `FACTORY_MODEL` env var. Flexible per-invocation.	Hardcoded in frontmatter per role (`sonnet`/`opus`). No runtime override.	Plugin agents can't adapt model at runtime.	Fix — omit `model` field from generated frontmatter so subagents inherit the parent session's model. Keep `agents.yml` model as a default/reference but don't emit it in `generate_agent_content()`.
Working directory	Explicit `cwd=project_path` passed to subprocess. Isolated.	Inherits Claude Code's CWD. Must be told project path via prompt.	Plugin agents may operate in wrong directory if CWD differs from target project.	Accept — this is inherent to Claude Code's subagent model. The pipeline skill already passes project context via the prompt.
Event capture	`agent.started/completed/failed` → `.factory/events.jsonl`. Full observability.	None — no event emission.	Plugin path is invisible to factory telemetry. Dashboard, ACE, and insights can't see plugin agent activity.	Fix — add a `factory emit` CLI subcommand (thin wrapper around `emit_event()`) that pipeline skills can call before/after each agent step. E.g., `factory emit agent.started --agent researcher --project /path`.
Review persistence	Output saved to `.factory/reviews/<role>-latest.md` with timestamp + exit code header.	Output returned to caller via Agent tool result. Not persisted to disk.	CEO gate decisions rely on `.factory/reviews/` files. Plugin pipeline can't use the same review pattern.	Fix — pipeline skills should write agent output to `.factory/reviews/<role>-latest.md` after each step (via Write tool or `tee`). This makes both paths produce the same artifacts.
Orchestration flexibility	Sequential only (`factory agent` is blocking). Parallel requires multiple bash tool calls from the caller.	Native parallel (multiple Agent calls in one message) + `run_in_background: true`.	CLI can't natively run agents in parallel.	Accept — this is a plugin advantage. CLI parallel is possible via `invoke_agents_parallel()` in Python but not from the CEO prompt.
Pipeline composition	Hardcoded in CEO prompt (fixed modes: Improve, Build, Discover...).	`/factory:pipeline-subagents` designs pipelines on the fly from any goal.	CLI path requires CEO's fixed pipeline structure.	Accept — this is the raison d'être for the pipeline skills. Different tools for different needs.
Crash recovery	Checkpoint file (`.factory/checkpoint.json`) + resume context. CEO can resume after crashes.	None — if parent session dies, pipeline state is lost.	Plugin pipelines can't resume.	Defer — could add checkpoint support to pipeline skills later, but low priority since plugin pipelines are typically shorter than full CEO cycles.
Cost control	Bob Shell ceiling enforcement, invocation logging to `bob_usage.jsonl`.	No cost tracking or ceiling enforcement.	Plugin path has no spending guardrails.	Defer — plugin agents use the parent session's token budget. Bob Shell isn't used in plugin mode.
Consecutive failure handling	`ConsecutiveAgentFailureError` after 2 failures — aborts cycle automatically.	No built-in failure tracking across steps.	Plugin pipelines don't auto-abort on repeated failures.	Accept — pipeline skill prompt includes "2 consecutive failures: ABORT pipeline" as a soft instruction. Hard enforcement would require Python changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin vs CLI agent gaps: observability, prompt enforcement, playbook injection #199

Context

Comparison Table

Fixes to implement (priority order)

1. Event capture — add `factory emit` CLI subcommand

2. Playbook injection in plugin generation

3. Model flexibility in plugin agents

4. Review persistence in pipeline skills

Gaps accepted (with rationale)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Plugin vs CLI agent gaps: observability, prompt enforcement, playbook injection #199

Description

Context

Comparison Table

Fixes to implement (priority order)

1. Event capture — add factory emit CLI subcommand

2. Playbook injection in plugin generation

3. Model flexibility in plugin agents

4. Review persistence in pipeline skills

Gaps accepted (with rationale)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Event capture — add `factory emit` CLI subcommand