Skip to content

Investigate: does forced tool-choice (required) inflate coder loop iterations/token usage? #222

@dratner

Description

@dratner

Type: investigation / hypothesis (not a confirmed bug — please don't treat as a regression yet)

Context

The maestro-llms migration (PR #220, phase 4) makes the tool loop send an explicit ToolChoice=required whenever it offers tools. On the legacy path this was a no-op for behavior (OpenAI/Gemini already forced tool use internally; Anthropic defaulted to auto). So on the new path, the Anthropic coder/PM now must emit a tool call every tool-loop iteration, where previously the model could return a plain-text / "think out loud" turn.

Hypothesis

Forcing a tool call every iteration on Anthropic may prolong tool loops and inflate iteration count + cumulative token spend, versus the old auto behavior where a terminating or reasoning text turn could let the loop converge sooner (or let a state handler advance).

Observation that prompted this

Live phase-5 run, story 3984527c ("Implement clear-chat backend handler", coder-001, Anthropic claude-opus-4-6):

  • 90 LLM calls for the single story, ~24,210 avg total tokens/call → ~2.18M cumulative tokens.
  • No context/compaction/truncation involved — largest single request in the whole run was ~50K tokens. The 2.1M is purely cumulative across many iterations.
  • The coding loop hit its soft (10) and hard (12) iteration limits and re-escalated (coding_df75d7d1).

Caveats (why this is only a hypothesis)

  • Not provable from one log: the coding loop also hit iteration limits on a legacy-path session earlier in testing, so high iteration count is not unique to the new path.
  • No apples-to-apples baseline: the prior legacy run was a different interview/story set and its log was overwritten.
  • Story complexity, model behavior, and tool design all independently drive iteration count.

Why we're filing this rather than just relaxing it

Relaxing forced tool-choice (e.g. reverting the toolloop to auto, or making it adaptive) would be a significant behavioral change for maestro, not a quick toggle — many state machines assume the loop produces tool calls. So this needs deliberate investigation, not a reflexive revert.

Suggested investigation

Same-story A/B: run an identical story once with MAESTRO_USE_LLMS=1 (forced required) and once with the toolloop sending auto on Anthropic, and compare iterations-to-converge and cumulative tokens. If required measurably inflates loops on Anthropic, options include: per-state tool-choice policy (require during CODING, allow auto on finalizing/decision turns), or a "text-or-tool" terminating affordance.

Recorded so it isn't lost in PR review; surfaced during phase-5 live validation of the maestro-llms migration. Cross-ref: PR #220, docs/MAESTRO_LLMS_MIGRATION.md §5 OC2/G2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions