Investigate: does forced tool-choice (required) inflate coder loop iterations/token usage?

**Type:** investigation / hypothesis (not a confirmed bug — please don't treat as a regression yet)

## Context

The maestro-llms migration (PR #220, phase 4) makes the tool loop send an explicit `ToolChoice=required` whenever it offers tools. On the **legacy** path this was a no-op for behavior (OpenAI/Gemini already forced tool use internally; **Anthropic defaulted to `auto`**). So on the new path, the **Anthropic** coder/PM now *must* emit a tool call every tool-loop iteration, where previously the model could return a plain-text / "think out loud" turn.

## Hypothesis

Forcing a tool call every iteration on Anthropic may **prolong tool loops** and inflate iteration count + cumulative token spend, versus the old `auto` behavior where a terminating or reasoning text turn could let the loop converge sooner (or let a state handler advance).

## Observation that prompted this

Live phase-5 run, story `3984527c` ("Implement clear-chat backend handler", coder-001, Anthropic `claude-opus-4-6`):

- **90 LLM calls** for the single story, ~24,210 avg total tokens/call → **~2.18M cumulative tokens**.
- No context/compaction/truncation involved — largest single request in the whole run was ~50K tokens. The 2.1M is purely cumulative across many iterations.
- The coding loop hit its soft (10) and hard (12) iteration limits and re-escalated (`coding_df75d7d1`).

## Caveats (why this is only a hypothesis)

- Not provable from one log: the coding loop also hit iteration limits on a **legacy-path** session earlier in testing, so high iteration count is not unique to the new path.
- No apples-to-apples baseline: the prior legacy run was a different interview/story set and its log was overwritten.
- Story complexity, model behavior, and tool design all independently drive iteration count.

## Why we're filing this rather than just relaxing it

Relaxing forced tool-choice (e.g. reverting the toolloop to `auto`, or making it adaptive) would be a **significant behavioral change for maestro**, not a quick toggle — many state machines assume the loop produces tool calls. So this needs deliberate investigation, not a reflexive revert.

## Suggested investigation

Same-story A/B: run an identical story once with `MAESTRO_USE_LLMS=1` (forced `required`) and once with the toolloop sending `auto` on Anthropic, and compare **iterations-to-converge** and cumulative tokens. If `required` measurably inflates loops on Anthropic, options include: per-state tool-choice policy (require during CODING, allow auto on finalizing/decision turns), or a "text-or-tool" terminating affordance.

Recorded so it isn't lost in PR review; surfaced during phase-5 live validation of the maestro-llms migration. Cross-ref: PR #220, `docs/MAESTRO_LLMS_MIGRATION.md` §5 OC2/G2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate: does forced tool-choice (required) inflate coder loop iterations/token usage? #222

Context

Hypothesis

Observation that prompted this

Caveats (why this is only a hypothesis)

Why we're filing this rather than just relaxing it

Suggested investigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Investigate: does forced tool-choice (required) inflate coder loop iterations/token usage? #222

Description

Context

Hypothesis

Observation that prompted this

Caveats (why this is only a hypothesis)

Why we're filing this rather than just relaxing it

Suggested investigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions