Skip to content

Plugin vs CLI agent gaps: observability, prompt enforcement, playbook injection #199

@xukai92

Description

@xukai92

Context

The factory supports two execution paths for specialist agents:

  • CLI modefactory agent <role> --task "..." --project /path → spawns claude -p subprocess via ClaudeRunner.headless()
  • Plugin mode — agents installed as Claude Code subagents (factory:<role>) via generate_agent_content(), invoked through the Agent tool or /factory:pipeline-subagents skill

Both are useful: CLI mode is the production path (used by the CEO), plugin mode enables composable pipelines and native parallel execution. But there are gaps between them that should be understood and selectively closed.

Comparison Table

Dimension CLI (factory agent <role>) Plugin (factory:<role> subagent) Gap Plan
Prompt delivery claude -p "<prompt+task>" — pipe mode, treated as user message. CEO interactive mode uses --append-system-prompt for stronger enforcement. Frontmatter (name/description/model/tools) + prompt body. Loaded as agent definition by Claude Code. CLI headless uses -p (user message), not system prompt. Only CEO interactive gets --append-system-prompt. Plugin prompt enforcement depends on Claude Code's agent loading. Accept-p is how all headless agents work. The CEO interactive path correctly uses --append-system-prompt. Plugin enforcement is Claude Code's responsibility.
Prompt resolution Two-tier: project override (.factory/agents/<role>.md) → factory default (factory/agents/prompts/<role>.md). One-tier: factory default only. Static file baked at sync_agents.py time. Plugin agents don't pick up project-specific prompt overrides. Accept for now — plugin agents are meant to be general-purpose. Project-specific overrides are a CEO/CLI concern.
Playbook injection Automatic — factory-default playbooks from factory/agents/playbooks/<role>.md (and user-evolved from ~/.factory/playbooks/<role>.md) injected by inject_playbook() at invocation time. None — prompt is frozen at generation time. Plugin agents miss behavioral playbook rules that CLI agents get. Fixgenerate_agent_content() should call load_playbook(role) and inject_playbook() when generating plugin agent files. This way sync_agents.py bakes in the current playbooks. Regeneration picks up evolved playbooks.
Tool access All tools available (unrestricted claude -p session). Role-specific whitelist in frontmatter (e.g., strategist gets [Bash, Read, Grep, Glob]). CLI agents have broader tool access than necessary. Plugin is more principled (least-privilege). Accept — plugin's tool restriction is actually better. CLI could adopt tool restrictions via --allowedTools if Claude Code supports it, but not urgent.
Permissions --dangerously-skip-permissions — full bypass, no user prompts. Claude Code's normal permission system — user may be prompted. Different trust models. Accept — CLI runs in automated/headless context where permission prompts would block. Plugin runs in interactive context where user oversight is appropriate.
Model selection Runtime override via --model flag or FACTORY_MODEL env var. Flexible per-invocation. Hardcoded in frontmatter per role (sonnet/opus). No runtime override. Plugin agents can't adapt model at runtime. Fix — omit model field from generated frontmatter so subagents inherit the parent session's model. Keep agents.yml model as a default/reference but don't emit it in generate_agent_content().
Working directory Explicit cwd=project_path passed to subprocess. Isolated. Inherits Claude Code's CWD. Must be told project path via prompt. Plugin agents may operate in wrong directory if CWD differs from target project. Accept — this is inherent to Claude Code's subagent model. The pipeline skill already passes project context via the prompt.
Event capture agent.started/completed/failed.factory/events.jsonl. Full observability. None — no event emission. Plugin path is invisible to factory telemetry. Dashboard, ACE, and insights can't see plugin agent activity. Fix — add a factory emit CLI subcommand (thin wrapper around emit_event()) that pipeline skills can call before/after each agent step. E.g., factory emit agent.started --agent researcher --project /path.
Review persistence Output saved to .factory/reviews/<role>-latest.md with timestamp + exit code header. Output returned to caller via Agent tool result. Not persisted to disk. CEO gate decisions rely on .factory/reviews/ files. Plugin pipeline can't use the same review pattern. Fix — pipeline skills should write agent output to .factory/reviews/<role>-latest.md after each step (via Write tool or tee). This makes both paths produce the same artifacts.
Orchestration flexibility Sequential only (factory agent is blocking). Parallel requires multiple bash tool calls from the caller. Native parallel (multiple Agent calls in one message) + run_in_background: true. CLI can't natively run agents in parallel. Accept — this is a plugin advantage. CLI parallel is possible via invoke_agents_parallel() in Python but not from the CEO prompt.
Pipeline composition Hardcoded in CEO prompt (fixed modes: Improve, Build, Discover...). /factory:pipeline-subagents designs pipelines on the fly from any goal. CLI path requires CEO's fixed pipeline structure. Accept — this is the raison d'être for the pipeline skills. Different tools for different needs.
Crash recovery Checkpoint file (.factory/checkpoint.json) + resume context. CEO can resume after crashes. None — if parent session dies, pipeline state is lost. Plugin pipelines can't resume. Defer — could add checkpoint support to pipeline skills later, but low priority since plugin pipelines are typically shorter than full CEO cycles.
Cost control Bob Shell ceiling enforcement, invocation logging to bob_usage.jsonl. No cost tracking or ceiling enforcement. Plugin path has no spending guardrails. Defer — plugin agents use the parent session's token budget. Bob Shell isn't used in plugin mode.
Consecutive failure handling ConsecutiveAgentFailureError after 2 failures — aborts cycle automatically. No built-in failure tracking across steps. Plugin pipelines don't auto-abort on repeated failures. Accept — pipeline skill prompt includes "2 consecutive failures: ABORT pipeline" as a soft instruction. Hard enforcement would require Python changes.

Fixes to implement (priority order)

1. Event capture — add factory emit CLI subcommand

Add a thin CLI command that pipeline skills can call to emit events:

factory emit agent.started --agent researcher --project "$(pwd)"
factory emit agent.completed --agent researcher --project "$(pwd)"

This wraps emit_event() from factory/events.py. Pipeline skills would call this before/after each agent step.

2. Playbook injection in plugin generation

Update generate_agent_content() in factory/agents/plugin.py to call load_playbook(role) and inject_playbook() when generating files. This bakes current playbooks into the generated agent files. Running sync_agents.py after ACE evolution picks up the latest playbooks.

3. Model flexibility in plugin agents

Omit the model field from generated frontmatter so subagents inherit the parent session's model. The agents.yml model field remains as documentation/reference but isn't emitted in the generated file.

4. Review persistence in pipeline skills

Update pipeline skill prompts to write agent output to .factory/reviews/<role>-latest.md after each step, matching the CLI path's artifact structure.

Gaps accepted (with rationale)

  • Prompt delivery-p vs --append-system-prompt is inherent to headless vs interactive mode
  • Prompt resolution — project overrides are a CEO concern, not a plugin concern
  • Tool access — plugin's least-privilege model is actually better
  • Permissions — different trust models for different contexts
  • Working directory — inherent to Claude Code's subagent model
  • Orchestration flexibility — plugin's advantage by design
  • Pipeline composition — different tools for different needs
  • Crash recovery — deferred, low priority for shorter plugin pipelines
  • Cost control — plugin uses parent session budget
  • Failure handling — soft instruction in pipeline prompt is sufficient

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions