Skip to content

[Guide] Haiku swarm codebase audit — process for surfacing improvement opportunities #60

@stackbilt-admin

Description

@stackbilt-admin

What this is

This is a process guide, not a feature request. It documents the audit methodology we ran on the img-forge repo using parallel Claude Haiku subagents, and proposes running the same process on llm-providers within its own domain.

The img-forge run produced 15 organized issues covering provider expansion, image editing, video generation, and infrastructure improvements — plus surfaced a cross-repo boundary issue (the ImageProvider overlap with img-forge, now tracked as #59). That cross-repo finding only emerged at the synthesis step, after agents had independently read both codebases. That's the part that's hard to get from a single-developer review pass.


The process

Step 1 — Scope your investigation angles

Don't use one big agent with a long prompt. Spawn 4-6 parallel agents, each scoped to a distinct, bounded slice of the codebase. They should not overlap. Each agent reads its slice and reports structured findings.

For img-forge, the slices were:

  • Gateway API structure and endpoints
  • Orchestrator / Durable Object state machine
  • MCP tools and model provider integrations
  • Image editing and video generation gaps
  • Repo metadata (existing issues, deps, wrangler configs, API keys)

The metadata agent is critical and easy to forget — it prevents duplicate issues and tells you what's already wired up.

Step 2 — Write agent prompts that produce structured output

Each prompt should:

  • State exactly what to read (specific directories or files, not "look around")
  • Ask for findings in a consistent format (file path + line number for every claim)
  • End with a clear deliverable: "This audit will become GitHub issues — be specific"

Vague prompts produce vague findings. Prescribe the output shape.

Step 3 — Synthesize before filing anything

Don't file issues as each agent returns. Wait for all agents to complete, then read across all reports looking for:

  • Cross-cutting findings that no single agent saw (e.g., two agents both touching the same abstraction from different sides)
  • Dependency order — which issues are prerequisites for which
  • Boundary questions — anything that implicates another repo in the org

The img-forge synthesis step revealed that ImageProvider in this repo was a frozen copy of img-forge internals — a finding that required reading both codebases, and produced issue #59 here and issue #64 on img-forge.

Step 4 — File issues grouped by category with explicit dependency callouts

Group issues by theme (provider coverage, infrastructure, DX, etc.). In each issue body:

  • Reference specific file paths and line numbers from agent findings
  • Call out prerequisite issues explicitly ("Depends on: #X")
  • Write acceptance criteria as checkboxes, not prose

Don't write issues that say "we should improve X." Write issues that say "change file.ts:line from A to B because C."


Suggested investigation angles for llm-providers

These are the natural slices given this repo's domain:

1. Provider completeness and parity audit

  • Which providers are implemented? Which are stubs or partial?
  • For each provider: what features are missing (streaming, tool use, JSON mode, vision, caching, seed)?
  • Which models in the catalog are outdated or deprecated by the provider?
  • What new providers are missing entirely (Mistral, Cohere, DeepSeek, xAI Grok, Amazon Bedrock)?

2. Circuit breaker and reliability behavior

  • How does the circuit breaker open/close? What are the thresholds?
  • What happens when all providers in the fallback chain are open simultaneously?
  • Are there edge cases where a failed request is retried against the same provider?
  • Does the circuit breaker state persist across Worker invocations (important for CF Workers)?

3. Cost tracking and CreditLedger accuracy

  • How is token usage counted? Is it pre-call estimation or post-call from response headers?
  • Which providers return accurate token counts vs. estimates?
  • Does streaming mode correctly accumulate token counts?
  • Are there requests that bypass cost tracking (tool call results, cached responses)?

4. TypeScript API surface and ergonomics

  • Is the public API surface minimal and well-typed, or does it leak internal types?
  • Are discriminated union types used consistently for provider-specific params?
  • Is LLMProviders.fromEnv() robust across CF Workers, Node.js, and Deno?
  • Any unknown or any escape hatches that could be tightened?

5. Vision / multimodal input handling

  • Which providers support LLMImageInput in messages?
  • What image formats and sizes are accepted per provider?
  • Is there input validation / size limiting before sending to providers?
  • How does vision fall back if the selected provider doesn't support it?

6. Repo health and packaging

  • Are there test gaps (untested providers, untested circuit breaker states)?
  • Is the published dist/ clean (no accidental test fixtures or type leaks)?
  • Are peer dependencies and engine constraints documented?
  • Does the changelog accurately reflect breaking changes?

Running the audit

Spawn all 6 agents in a single message so they run in parallel. Each agent should report:

  • Findings list with file paths and line numbers
  • A "gaps" section: what's missing or broken
  • A "opportunities" section: what could be improved without being broken

Collect all 6 reports, read them together, then file issues. Budget 30-60 minutes total including issue writing.


Cross-repo concerns to watch for

Based on the img-forge ↔ llm-providers relationship, the following cross-repo concerns are worth checking during synthesis:

  • Any code in llm-providers that duplicates img-forge logicImageProvider was one; audit for others
  • Env/config assumptions — does LLMProviders.fromEnv() assume env var names that conflict with img-forge's wrangler bindings?
  • Versioning coupling — img-forge will import @stackbilt/llm-providers at a pinned version; any breaking changes in llm-providers now have a downstream impact on img-forge

Output from the img-forge run (for reference)

The img-forge audit produced:

The full issue set is at: https://github.com/Stackbilt-dev/img-forge/issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions