Type: investigation / hypothesis (not a confirmed bug — please don't treat as a regression yet)
Context
The maestro-llms migration (PR #220, phase 4) makes the tool loop send an explicit ToolChoice=required whenever it offers tools. On the legacy path this was a no-op for behavior (OpenAI/Gemini already forced tool use internally; Anthropic defaulted to auto). So on the new path, the Anthropic coder/PM now must emit a tool call every tool-loop iteration, where previously the model could return a plain-text / "think out loud" turn.
Hypothesis
Forcing a tool call every iteration on Anthropic may prolong tool loops and inflate iteration count + cumulative token spend, versus the old auto behavior where a terminating or reasoning text turn could let the loop converge sooner (or let a state handler advance).
Observation that prompted this
Live phase-5 run, story 3984527c ("Implement clear-chat backend handler", coder-001, Anthropic claude-opus-4-6):
- 90 LLM calls for the single story, ~24,210 avg total tokens/call → ~2.18M cumulative tokens.
- No context/compaction/truncation involved — largest single request in the whole run was ~50K tokens. The 2.1M is purely cumulative across many iterations.
- The coding loop hit its soft (10) and hard (12) iteration limits and re-escalated (
coding_df75d7d1).
Caveats (why this is only a hypothesis)
- Not provable from one log: the coding loop also hit iteration limits on a legacy-path session earlier in testing, so high iteration count is not unique to the new path.
- No apples-to-apples baseline: the prior legacy run was a different interview/story set and its log was overwritten.
- Story complexity, model behavior, and tool design all independently drive iteration count.
Why we're filing this rather than just relaxing it
Relaxing forced tool-choice (e.g. reverting the toolloop to auto, or making it adaptive) would be a significant behavioral change for maestro, not a quick toggle — many state machines assume the loop produces tool calls. So this needs deliberate investigation, not a reflexive revert.
Suggested investigation
Same-story A/B: run an identical story once with MAESTRO_USE_LLMS=1 (forced required) and once with the toolloop sending auto on Anthropic, and compare iterations-to-converge and cumulative tokens. If required measurably inflates loops on Anthropic, options include: per-state tool-choice policy (require during CODING, allow auto on finalizing/decision turns), or a "text-or-tool" terminating affordance.
Recorded so it isn't lost in PR review; surfaced during phase-5 live validation of the maestro-llms migration. Cross-ref: PR #220, docs/MAESTRO_LLMS_MIGRATION.md §5 OC2/G2.
Type: investigation / hypothesis (not a confirmed bug — please don't treat as a regression yet)
Context
The maestro-llms migration (PR #220, phase 4) makes the tool loop send an explicit
ToolChoice=requiredwhenever it offers tools. On the legacy path this was a no-op for behavior (OpenAI/Gemini already forced tool use internally; Anthropic defaulted toauto). So on the new path, the Anthropic coder/PM now must emit a tool call every tool-loop iteration, where previously the model could return a plain-text / "think out loud" turn.Hypothesis
Forcing a tool call every iteration on Anthropic may prolong tool loops and inflate iteration count + cumulative token spend, versus the old
autobehavior where a terminating or reasoning text turn could let the loop converge sooner (or let a state handler advance).Observation that prompted this
Live phase-5 run, story
3984527c("Implement clear-chat backend handler", coder-001, Anthropicclaude-opus-4-6):coding_df75d7d1).Caveats (why this is only a hypothesis)
Why we're filing this rather than just relaxing it
Relaxing forced tool-choice (e.g. reverting the toolloop to
auto, or making it adaptive) would be a significant behavioral change for maestro, not a quick toggle — many state machines assume the loop produces tool calls. So this needs deliberate investigation, not a reflexive revert.Suggested investigation
Same-story A/B: run an identical story once with
MAESTRO_USE_LLMS=1(forcedrequired) and once with the toolloop sendingautoon Anthropic, and compare iterations-to-converge and cumulative tokens. Ifrequiredmeasurably inflates loops on Anthropic, options include: per-state tool-choice policy (require during CODING, allow auto on finalizing/decision turns), or a "text-or-tool" terminating affordance.Recorded so it isn't lost in PR review; surfaced during phase-5 live validation of the maestro-llms migration. Cross-ref: PR #220,
docs/MAESTRO_LLMS_MIGRATION.md§5 OC2/G2.