From dc42bf8a48ff2cce1df76e4ffbb7c0b8820cd3f0 Mon Sep 17 00:00:00 2001 From: TheFactoriousDROID Date: Sun, 7 Jun 2026 01:59:45 +0000 Subject: [PATCH 1/3] docs: align reasoning-effort values with available models Document the label-to-value mapping (e.g. Extra High -> xhigh, Max -> max) as a single source of truth in settings.mdx, and point the CLI reference and choosing-your-model pages at it plus /models instead of duplicating per-model lists that drift. Also fix the stale 'Anthropic defaults to off' claim, nest model/reasoningEffort under sessionDefaultSettings, and refresh stale model names/examples. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --- docs/cli/configuration/settings.mdx | 37 +++++---- docs/cli/user-guides/choosing-your-model.mdx | 82 ++++++++------------ docs/reference/cli-reference.mdx | 12 +-- 3 files changed, 61 insertions(+), 70 deletions(-) diff --git a/docs/cli/configuration/settings.mdx b/docs/cli/configuration/settings.mdx index 40854f807..5f72d17e8 100644 --- a/docs/cli/configuration/settings.mdx +++ b/docs/cli/configuration/settings.mdx @@ -44,8 +44,8 @@ Local overrides merge on top of the corresponding `settings.json` at the same le | Setting | Options | Default | Description | | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------- | -------------------------------------------------------------------------- | -| `model` | Any [available model ID](/models) | Product default | The default AI model used by droid | -| `reasoningEffort` | `off`, `none`, `low`, `medium`, `high` (availability depends on the model) | Model-dependent default | Controls how much structured thinking the model performs. | +| `sessionDefaultSettings.model` | Any [available model ID](/models) | Product default | The default AI model used by droid | +| `sessionDefaultSettings.reasoningEffort` | Reasoning value (model-dependent) — see [Reasoning effort](#reasoning-effort) | Model-dependent default | Controls how much structured thinking the model performs. | | `sessionDefaultSettings.interactionMode` | `auto`, `spec` | `auto` | Sets whether new sessions start in Auto or Spec Mode. | | `sessionDefaultSettings.autonomyLevel` | `off`, `low`, `medium`, `high` | `off` | Sets the default [Autonomy Level](/cli/user-guides/auto-run) for new sessions. | | `cloudSessionSync` | `true`, `false` | `true` | Mirror CLI sessions to Factory web. | @@ -65,16 +65,25 @@ Local overrides merge on top of the corresponding `settings.json` at the same le ### Model -Set `model` to a [model ID from Available Models](/models). For custom models, see [Bring Your Own Key (BYOK)](/cli/byok/overview). +Set `sessionDefaultSettings.model` to a [model ID from Available Models](/models). For custom models, see [Bring Your Own Key (BYOK)](/cli/byok/overview). + +For backward compatibility, `model`, `reasoningEffort`, `autonomyMode`, `specModeModel`, and `specModeReasoningEffort` are also accepted as top-level keys; droid migrates them into `sessionDefaultSettings` on load. New configurations should nest them under `sessionDefaultSettings`. ### Reasoning effort -`reasoningEffort` adjusts how much structured thinking the model performs before replying. Available values depend on the model, but typically include: +`sessionDefaultSettings.reasoningEffort` controls how much structured thinking the model performs before replying. The model picker and the [Available Models](/models) **Reasoning** column show these levels as display labels; use the matching value below in `settings.json`: -- **`off` / `none`** – disable structured reasoning (fastest). -- **`low`**, **`medium`**, **`high`** – progressively increase deliberation time for more complex reasoning. +| Label | `settings.json` value | +| ------------ | --------------------- | +| Off / None | `off` / `none` | +| Minimal | `minimal` | +| Low | `low` | +| Medium | `medium` | +| High | `high` | +| Extra High | `xhigh` | +| Max | `max` | -Anthropic models default to `off`, while GPT-5 starts on `medium`. +Higher effort increases latency and cost. Which levels are available — and which is the default — depend on the model, so check the **Reasoning** column in [Available Models](/models). (`dynamic` is an internal adaptive setting and is not selected manually.) ### Autonomy level @@ -162,7 +171,7 @@ Defaults applied when a new session starts. See also `sessionDefaultSettings.int | Setting | Type | Options | Default | Description | | ------------------------------------------------ | ------ | ---------------------------------------- | -------------- | ---------------------------------------------------------- | | `sessionDefaultSettings.specModeModel` | string | Any [available model ID](/models) | Inherits model | Override the model used when sessions start in Spec Mode. | -| `sessionDefaultSettings.specModeReasoningEffort` | string | `off`, `none`, `low`, `medium`, `high` | Model default | Reasoning effort applied to the spec model. | +| `sessionDefaultSettings.specModeReasoningEffort` | string | See [Reasoning effort](#reasoning-effort) | Model default | Reasoning effort applied to the spec model. | ## Display and UI @@ -192,13 +201,13 @@ Configure [Missions](/cli/features/missions) — multi-agent orchestration runs. | Setting | Type | Options | Default | Description | | ---------------------------------------------------- | ------- | ---------------------------------------- | ---------------- | ---------------------------------------------------------------------------------------- | | `missionModelSettings.workerModel` | string | Any [available model ID](/models) | Inherits | Default model used by mission worker subagents. | -| `missionModelSettings.workerReasoningEffort` | string | `off`, `none`, `low`, `medium`, `high` | Model default | Reasoning effort for mission workers. | +| `missionModelSettings.workerReasoningEffort` | string | See [Reasoning effort](#reasoning-effort) | Model default | Reasoning effort for mission workers. | | `missionModelSettings.validationWorkerModel` | string | Any [available model ID](/models) | Inherits | Model used by mission validators (scrutiny / user-testing workers). | -| `missionModelSettings.validationWorkerReasoningEffort` | string | `off`, `none`, `low`, `medium`, `high` | Model default | Reasoning effort for validation workers. | +| `missionModelSettings.validationWorkerReasoningEffort` | string | See [Reasoning effort](#reasoning-effort) | Model default | Reasoning effort for validation workers. | | `missionModelSettings.skipScrutiny` | boolean | `true`, `false` | `false` | Skip scrutiny validation milestones during missions. | | `missionModelSettings.skipUserTesting` | boolean | `true`, `false` | `false` | Skip user-testing validation milestones during missions. | | `missionOrchestratorModel` | string | Any [available model ID](/models) | Inherits | Model used by the mission orchestrator. | -| `missionOrchestratorReasoningEffort` | string | `off`, `none`, `low`, `medium`, `high` | Model default | Reasoning effort for the mission orchestrator. | +| `missionOrchestratorReasoningEffort` | string | See [Reasoning effort](#reasoning-effort) | Model default | Reasoning effort for the mission orchestrator. | | `keepSystemAwakeDuringMissions` | boolean | `true`, `false` | `true` | Prevent the OS from sleeping while a mission is running. | ## Context and compaction @@ -254,8 +263,10 @@ System-level settings for status line, worktrees, and request timeouts. ```json { - "model": "claude-opus-4-7", - "reasoningEffort": "low", + "sessionDefaultSettings": { + "model": "claude-opus-4-8", + "reasoningEffort": "high" + }, "diffMode": "github", "cloudSessionSync": true, "completionSound": "fx-ok01", diff --git a/docs/cli/user-guides/choosing-your-model.mdx b/docs/cli/user-guides/choosing-your-model.mdx index bb0d3bd1c..c47a2c4f5 100644 --- a/docs/cli/user-guides/choosing-your-model.mdx +++ b/docs/cli/user-guides/choosing-your-model.mdx @@ -22,32 +22,29 @@ Balance accuracy, speed, and cost by picking the right model and reasoning level Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shifts. Use this guide as a snapshot of how the major options compare today, and expect to revisit it as we publish updates. -This guide was last updated on Wednesday, June 3rd 2026. +This guide was last updated in June 2026. --- -## 1 · Current stack rank (March 2026) - -| Rank | Model | Why we reach for it | -| ---- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -| 1 | **Claude Opus 4.7** | Newest Anthropic flagship with **Max** reasoning; top pick for the hardest work. Promotional 1× multiplier through April 30 (2× afterward). | -| 2 | **Claude Opus 4.6** | Previous Anthropic flagship with **Max** reasoning; still excellent depth and safety for complex work. | -| 3 | **Claude Opus 4.6 Fast** | Opus 4.6 tuned for faster response times; 12× multiplier. | -| 4 | **Claude Opus 4.5** | Proven quality-and-safety balance; strong default for TUI and exec. | -| 5 | **Claude Sonnet 4.6** | **Max** reasoning at the Sonnet price point (1.2×); strong daily driver for planning and implementation. | -| 6 | **GPT-5.4** | Latest OpenAI model with 922K context, 128K output, verbosity support, and **Extra High** reasoning; excellent for large-context tasks. | -| 7 | **Claude Sonnet 4.5** | Strong daily driver with balanced cost/quality; great general-purpose choice when you don't need Opus-level depth. | -| 8 | **GPT-5.3-Codex** | Newest OpenAI coding model with **Extra High** reasoning and verbosity support; strong for implementation-heavy tasks. | -| 9 | **GPT-5.2-Codex** | Proven OpenAI coding model with **Extra High** reasoning; solid for implementation-heavy tasks. | -| 10 | **GPT-5.2** | OpenAI model with verbosity support and reasoning up to **Extra High**. | -| 11 | **Claude Haiku 4.5** | Fast, cost-efficient for routine tasks and high-volume automation. | -| 12 | **Gemini 3.1 Pro** | Newer Gemini Pro generation with strong structured outputs and mixed reasoning controls for research-heavy tasks. | -| 13 | **Gemini 3 Flash** | Fast, cheap (0.2× multiplier) with full reasoning support; great for high-volume tasks where speed matters. | -| 14 | **Droid Core (MiniMax M2.7)** | Open-source, 0.12× multiplier with reasoning support (Low/Medium/High) and image support; cheapest model available. | -| 15 | **Droid Core (GLM-5.1)** | Open-source, 0.55× multiplier, newer GLM option for bulk automation and air-gapped environments; no image support. | -| 16 | **Droid Core (GLM-5)** | Open-source, 0.4× multiplier, stable choice for bulk automation and air-gapped environments; no image support. | -| 17 | **Droid Core (Kimi K2.6)** | Open-source, 0.4× multiplier with image support and optional High reasoning; good for cost-sensitive work when you still want a thinking toggle. | -| 18 | **Droid Core (Kimi K2.5)** | Open-source, 0.25× multiplier with image support; older Kimi option for cost-sensitive work. | +## 1 · Current stack rank + +Use this as a qualitative starting point. For the full catalog — multipliers, reasoning levels, and defaults — see [Available Models](/models). + +| Rank | Model | Why we reach for it | +| ---- | ----- | ------------------- | +| 1 | **Claude Opus 4.8** | Newest Anthropic flagship with the deepest reasoning; top pick for the hardest architecture and refactor work. | +| 2 | **Claude Opus 4.8 Fast** | Opus 4.8 quality tuned for faster responses when you want depth with less wait. | +| 3 | **Claude Opus 4.7** | Previous Anthropic flagship; still excellent depth and safety for complex work. | +| 4 | **Claude Sonnet 4.6** | Top-tier reasoning at the Sonnet price point; strong daily driver for planning and implementation. | +| 5 | **GPT-5.5** | Latest OpenAI flagship with large context and high-effort reasoning; excellent for large-context tasks. | +| 6 | **GPT-5.5 Pro** | Highest-effort OpenAI option for the most demanding reasoning. | +| 7 | **GPT-5.4** | Strong large-context OpenAI model for big refactors and reviews. | +| 8 | **GPT-5.3-Codex** | OpenAI coding-tuned model with high-effort reasoning; strong for implementation-heavy tasks. | +| 9 | **Claude Sonnet 4.5** | Balanced cost/quality general-purpose choice when you don't need Opus-level depth. | +| 10 | **Gemini 3.1 Pro** | Strong structured outputs and mixed reasoning controls for research-heavy tasks. | +| 11 | **Claude Haiku 4.5** | Fast, cost-efficient for routine tasks and high-volume automation. | +| 12 | **Gemini 3.5 Flash / Gemini 3 Flash** | Fast and cheap with full reasoning support; great for high-volume tasks where speed matters. | +| 13 | **Droid Core (open models)** | Open-source options (GLM-5.1, Nemotron 3 Ultra, Kimi, DeepSeek V4 Pro, MiniMax) for cost-sensitive or air-gapped work. | We ship model updates regularly. When a new release overtakes the list above, we update this page and the @@ -60,18 +57,17 @@ This guide was last updated on Wednesday, June 3rd 2026. | Scenario | Recommended model | | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Opus 4.7** for best depth and safety (1× promotional multiplier through April 30), or fall back to **Opus 4.6** / **Opus 4.6 Fast** for faster turnaround. Use **Sonnet 4.6** or **Sonnet 4.5** when you want balanced cost/quality, or **GPT-5.4** for large-context reasoning. | -| **Full-feature development, large refactors** | **Opus 4.7** or **Opus 4.6** for depth and safety. **GPT-5.4**, **GPT-5.3-Codex**, or **GPT-5.2-Codex** when you need speed plus **Extra High** reasoning; **Sonnet 4.6** or **Sonnet 4.5** for balanced loops. | -| **Repeatable edits, summarization, boilerplate generation** | **Haiku 4.5** or **Droid Core** (including **MiniMax M2.7** at 0.12×) for speed and cost. **GPT-5.2** when you need higher quality or structured outputs. | +| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Opus 4.8** for best depth and safety, or **Opus 4.7** / **Opus 4.8 Fast** for faster turnaround. Use **Sonnet 4.6** when you want balanced cost/quality, or **GPT-5.5** / **GPT-5.4** for large-context reasoning. | +| **Full-feature development, large refactors** | **Opus 4.8** or **Opus 4.7** for depth and safety. **GPT-5.5**, **GPT-5.4**, or **GPT-5.3-Codex** when you want speed plus high-effort reasoning; **Sonnet 4.6** for balanced loops. | +| **Repeatable edits, summarization, boilerplate generation** | **Haiku 4.5** or **Droid Core** for speed and cost. **GPT-5.4 Mini** or **GPT-5.2** when you need higher quality or structured outputs. | | **CI/CD or automation loops** | Favor **Haiku 4.5** or **Droid Core** for predictable, low-cost throughput. Use **GPT-5.3-Codex** or **GPT-5.4** when automation needs stronger reasoning. | -| **High-volume automation, frequent quick turns** | **Haiku 4.5** for speedy feedback. **Droid Core** (especially **MiniMax M2.7** at 0.12× with reasoning) when cost is critical or you need air-gapped deployment. | +| **High-volume automation, frequent quick turns** | **Haiku 4.5** or **Gemini Flash** for speedy feedback. **Droid Core** when cost is critical or you need air-gapped deployment. | - **Claude Opus 4.7** is the newest top-tier option for extremely complex architecture decisions or critical - work where you need maximum reasoning capability—and it runs at a promotional 1× multiplier through April 30 - (2× afterward). **Claude Opus 4.6** remains an excellent alternative, and **Opus 4.6 Fast** is tuned for - faster responses at a higher cost. Most tasks don't require Opus-level power—start with Sonnet 4.6 or Sonnet - 4.5 and escalate only if needed. + **Claude Opus 4.8** is the newest top-tier option for extremely complex architecture decisions or critical + work where you need maximum reasoning capability. **Opus 4.7** remains an excellent alternative, and **Opus + 4.8 Fast** is tuned for faster responses. Most tasks don't require Opus-level power—start with **Sonnet 4.6** + and escalate only if needed. Tip: you can swap models mid-session with `/model` or by toggling in the settings panel (`Shift+Tab` → **Settings**). @@ -89,23 +85,7 @@ Tip: you can swap models mid-session with `/model` or by toggling in the setting ## 4 · Reasoning effort settings -- **Opus 4.7**: Off / Low / Medium / High / **Max** (default: High) -- **Opus 4.6 / Opus 4.6 Fast**: Off / Low / Medium / High / **Max** (default: High) -- **Sonnet 4.6**: Off / Low / Medium / High / **Max** (default: High) -- **Opus 4.5 / Sonnet 4.5 / Haiku 4.5**: Off / Low / Medium / High (default: Off) -- **GPT-5.4**: None / Low / Medium / High / **Extra High** (default: Medium) -- **GPT-5.2**: Off / Low / Medium / High / **Extra High** (default: Low) -- **GPT-5.2-Codex**: None / Low / Medium / High / **Extra High** (default: Medium) -- **GPT-5.3-Codex**: None / Low / Medium / High / **Extra High** (default: Medium) -- **Gemini 3.1 Pro**: Low / Medium / High (default: High) -- **Gemini 3 Flash**: Minimal / Low / Medium / High (default: High) -- **Droid Core (GLM-5)**: None only (default: None; no image support) -- **Droid Core (GLM-5.1)**: None only (default: None; no image support) -- **Droid Core (Kimi K2.6)**: Off / High (default: High) -- **Droid Core (Kimi K2.5)**: None only (default: None) -- **Droid Core (MiniMax M2.7)**: Low / Medium / High (default: High) - -Reasoning effort increases latency and cost—start low for simple work and escalate as needed. **Max** is available on Claude Opus 4.7, the Opus 4.6 family (Opus 4.6 and Opus 4.6 Fast), and Sonnet 4.6. **Extra High** is available on GPT-5.4, GPT-5.2, GPT-5.2-Codex, and GPT-5.3-Codex. +Higher reasoning effort increases latency and cost—start low for simple work and escalate as needed. The levels each model supports, and its default, are listed in the **Reasoning** column of [Available Models](/models). To set reasoning effort in `settings.json`—including how display labels like **Extra High** map to values such as `xhigh`—see [Reasoning effort](/cli/configuration/settings#reasoning-effort). Change reasoning effort from `/model` → **Reasoning effort**, or via the settings menu. @@ -117,14 +97,14 @@ Factory ships with managed Anthropic and OpenAI access. If you prefer to run aga ### Open-source models -**Droid Core (GLM-5)**, **Droid Core (GLM-5.1)**, **Droid Core (Kimi K2.6)**, **Droid Core (Kimi K2.5)**, and **Droid Core (MiniMax M2.7)** are open-source alternatives available in the CLI. They're useful for: +Factory's **Droid Core** open models (see [Available Models](/models) for the current list) are open-source alternatives available in the CLI. They're useful for: - **Air-gapped environments** where external API calls aren't allowed - **Cost-sensitive projects** needing unlimited local inference - **Privacy requirements** where code cannot leave your infrastructure - **Experimentation** with open-source model capabilities -**Note:** GLM-5 and GLM-5.1 do not support image attachments. Kimi K2.5, Kimi K2.6, and MiniMax M2.7 do support images. Kimi K2.6 adds an Off/High reasoning toggle, while MiniMax M2.7 (the cheapest model available, with 0.12× multiplier) supports Low/Medium/High reasoning. For image-based workflows, use Claude, GPT, Kimi, or MiniMax M2.7. +**Note:** image support varies across Droid Core models—the GLM family does not accept image attachments, while Kimi and MiniMax do. For image-based workflows, prefer Claude, GPT, Gemini, Kimi, or MiniMax. To use open-source models, you'll need to configure them via BYOK with a local inference server (like Ollama) or a hosted provider. See [BYOK documentation](/cli/byok/overview) for setup instructions. diff --git a/docs/reference/cli-reference.mdx b/docs/reference/cli-reference.mdx index df0553520..1cde5c7ff 100644 --- a/docs/reference/cli-reference.mdx +++ b/docs/reference/cli-reference.mdx @@ -57,7 +57,7 @@ Customize droid's behavior with command-line flags: | Flag | Description | Example | | :-------------------------------- | :----------------------------------------------------------------- | :----------------------------------------------------------- | | `-f, --file ` | Read prompt from a file | `droid exec -f plan.md` | -| `-m, --model ` | Select a specific [model ID](/models) | `droid exec -m claude-opus-4-7` | +| `-m, --model ` | Select a specific [model ID](/models) | `droid exec -m claude-opus-4-8` | | `-s, --session-id ` | Continue an existing session | `droid exec -s session-abc123` | | `--auto ` | Set [autonomy level](#autonomy-levels) (`low`, `medium`, `high`) | `droid exec --auto medium "run tests"` | | `--enabled-tools ` | Force-enable specific tools (comma or space separated) | `droid exec --enabled-tools ApplyPatch,Bash` | @@ -66,8 +66,8 @@ Customize droid's behavior with command-line flags: | `-o, --output-format ` | Output format (`text`, `json`, `stream-json`, `stream-jsonrpc`) | `droid exec -o json "document API"` | | `--input-format ` | Input format (`stream-jsonrpc` for multi-turn) | `droid exec --input-format stream-jsonrpc -o stream-jsonrpc` | | `-r, --resume [sessionId]` | Resume a previous session. In interactive mode, `-r` is `--resume`; in `droid exec`, `-r` is `--reasoning-effort`. | `droid -r` | -| `-r, --reasoning-effort ` | Override reasoning effort (`off`, `none`, `low`, `medium`, `high`). In `droid exec`, `-r` maps to this flag. | `droid exec -r high "debug flaky test"` | -| `--spec-model ` | Use a different [model ID](/models) for specification planning | `droid exec --spec-model claude-opus-4-7` | +| `-r, --reasoning-effort ` | Override reasoning effort; valid values are model-dependent (see [Reasoning effort](/cli/configuration/settings#reasoning-effort)). In `droid exec`, `-r` maps to this flag. | `droid exec -r high "debug flaky test"` | +| `--spec-model ` | Use a different [model ID](/models) for specification planning | `droid exec --spec-model claude-opus-4-8` | | `--spec-reasoning-effort ` | Override reasoning effort for spec mode | `droid exec --use-spec --spec-reasoning-effort high` | | `--use-spec` | Start in specification mode (plan before executing) | `droid exec --use-spec "add user profiles"` | | `--skip-permissions-unsafe` | Skip all permission prompts (⚠️ use with extreme caution) | `droid exec --skip-permissions-unsafe` | @@ -78,8 +78,8 @@ Customize droid's behavior with command-line flags: | `--fork ` | Fork and resume an existing session into a new copy | `droid exec --fork session-abc123` | | `--mission` | Run `droid exec` in [mission mode](/cli/features/missions) (multi-agent orchestration) | `droid exec --mission -f mission.md` | | `--worker-model ` | Model used for mission worker agents | `droid exec --mission --worker-model claude-sonnet-4-6` | -| `--worker-reasoning-effort ` | Reasoning effort for mission worker agents (`off`, `none`, `low`, `medium`, `high`) | `droid exec --mission --worker-reasoning-effort medium` | -| `--validator-model ` | Model used for mission validator agents | `droid exec --mission --validator-model claude-opus-4-7` | +| `--worker-reasoning-effort ` | Reasoning effort for mission worker agents (model-dependent; see [Reasoning effort](/cli/configuration/settings#reasoning-effort)) | `droid exec --mission --worker-reasoning-effort medium` | +| `--validator-model ` | Model used for mission validator agents | `droid exec --mission --validator-model claude-opus-4-8` | | `--validator-reasoning-effort ` | Reasoning effort for mission validator agents | `droid exec --mission --validator-reasoning-effort high` | | `--append-system-prompt ` | Append custom text to the end of the system prompt | `droid --append-system-prompt "Always run tests."` | | `--append-system-prompt-file ` | Append the contents of a file to the end of the system prompt | `droid --append-system-prompt-file .factory/system.md` | @@ -149,7 +149,7 @@ The interactive REPL supports a rich set of keyboard shortcuts for navigation, o | `Ctrl+J` | Toggle the changelog display (dismiss / restore) | | `Ctrl+E` | Toggle the approval details view | | `Ctrl+V` | Paste an image from the clipboard as an attachment | -| `Tab` | Cycle reasoning effort (`low` → `medium` → `high` → `off`) | +| `Tab` | Cycle reasoning effort through the levels supported by the current model | | `Shift+Tab` | Cycle interaction modes (Auto → Spec → Mission) | | `@` | File path autocomplete — typing `@` triggers fuzzy file search | | `Up` / `Down` | Navigate input history (cycle through previously submitted messages) | From 7a20c75e342ca2e8fe3ad774bb703c301e88053a Mon Sep 17 00:00:00 2001 From: TheFactoriousDROID Date: Sun, 7 Jun 2026 20:10:42 +0000 Subject: [PATCH 2/3] docs: keep choosing-your-model out of the reasoning-effort branch Reverts docs/cli/user-guides/choosing-your-model.mdx to match main so this branch is scoped to the factual reasoning-effort settings and CLI reference updates. The page redesign is parked in a git stash for separate handling. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --- docs/cli/user-guides/choosing-your-model.mdx | 82 ++++++++++++-------- 1 file changed, 51 insertions(+), 31 deletions(-) diff --git a/docs/cli/user-guides/choosing-your-model.mdx b/docs/cli/user-guides/choosing-your-model.mdx index c47a2c4f5..bb0d3bd1c 100644 --- a/docs/cli/user-guides/choosing-your-model.mdx +++ b/docs/cli/user-guides/choosing-your-model.mdx @@ -22,29 +22,32 @@ Balance accuracy, speed, and cost by picking the right model and reasoning level Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shifts. Use this guide as a snapshot of how the major options compare today, and expect to revisit it as we publish updates. -This guide was last updated in June 2026. +This guide was last updated on Wednesday, June 3rd 2026. --- -## 1 · Current stack rank - -Use this as a qualitative starting point. For the full catalog — multipliers, reasoning levels, and defaults — see [Available Models](/models). - -| Rank | Model | Why we reach for it | -| ---- | ----- | ------------------- | -| 1 | **Claude Opus 4.8** | Newest Anthropic flagship with the deepest reasoning; top pick for the hardest architecture and refactor work. | -| 2 | **Claude Opus 4.8 Fast** | Opus 4.8 quality tuned for faster responses when you want depth with less wait. | -| 3 | **Claude Opus 4.7** | Previous Anthropic flagship; still excellent depth and safety for complex work. | -| 4 | **Claude Sonnet 4.6** | Top-tier reasoning at the Sonnet price point; strong daily driver for planning and implementation. | -| 5 | **GPT-5.5** | Latest OpenAI flagship with large context and high-effort reasoning; excellent for large-context tasks. | -| 6 | **GPT-5.5 Pro** | Highest-effort OpenAI option for the most demanding reasoning. | -| 7 | **GPT-5.4** | Strong large-context OpenAI model for big refactors and reviews. | -| 8 | **GPT-5.3-Codex** | OpenAI coding-tuned model with high-effort reasoning; strong for implementation-heavy tasks. | -| 9 | **Claude Sonnet 4.5** | Balanced cost/quality general-purpose choice when you don't need Opus-level depth. | -| 10 | **Gemini 3.1 Pro** | Strong structured outputs and mixed reasoning controls for research-heavy tasks. | -| 11 | **Claude Haiku 4.5** | Fast, cost-efficient for routine tasks and high-volume automation. | -| 12 | **Gemini 3.5 Flash / Gemini 3 Flash** | Fast and cheap with full reasoning support; great for high-volume tasks where speed matters. | -| 13 | **Droid Core (open models)** | Open-source options (GLM-5.1, Nemotron 3 Ultra, Kimi, DeepSeek V4 Pro, MiniMax) for cost-sensitive or air-gapped work. | +## 1 · Current stack rank (March 2026) + +| Rank | Model | Why we reach for it | +| ---- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | +| 1 | **Claude Opus 4.7** | Newest Anthropic flagship with **Max** reasoning; top pick for the hardest work. Promotional 1× multiplier through April 30 (2× afterward). | +| 2 | **Claude Opus 4.6** | Previous Anthropic flagship with **Max** reasoning; still excellent depth and safety for complex work. | +| 3 | **Claude Opus 4.6 Fast** | Opus 4.6 tuned for faster response times; 12× multiplier. | +| 4 | **Claude Opus 4.5** | Proven quality-and-safety balance; strong default for TUI and exec. | +| 5 | **Claude Sonnet 4.6** | **Max** reasoning at the Sonnet price point (1.2×); strong daily driver for planning and implementation. | +| 6 | **GPT-5.4** | Latest OpenAI model with 922K context, 128K output, verbosity support, and **Extra High** reasoning; excellent for large-context tasks. | +| 7 | **Claude Sonnet 4.5** | Strong daily driver with balanced cost/quality; great general-purpose choice when you don't need Opus-level depth. | +| 8 | **GPT-5.3-Codex** | Newest OpenAI coding model with **Extra High** reasoning and verbosity support; strong for implementation-heavy tasks. | +| 9 | **GPT-5.2-Codex** | Proven OpenAI coding model with **Extra High** reasoning; solid for implementation-heavy tasks. | +| 10 | **GPT-5.2** | OpenAI model with verbosity support and reasoning up to **Extra High**. | +| 11 | **Claude Haiku 4.5** | Fast, cost-efficient for routine tasks and high-volume automation. | +| 12 | **Gemini 3.1 Pro** | Newer Gemini Pro generation with strong structured outputs and mixed reasoning controls for research-heavy tasks. | +| 13 | **Gemini 3 Flash** | Fast, cheap (0.2× multiplier) with full reasoning support; great for high-volume tasks where speed matters. | +| 14 | **Droid Core (MiniMax M2.7)** | Open-source, 0.12× multiplier with reasoning support (Low/Medium/High) and image support; cheapest model available. | +| 15 | **Droid Core (GLM-5.1)** | Open-source, 0.55× multiplier, newer GLM option for bulk automation and air-gapped environments; no image support. | +| 16 | **Droid Core (GLM-5)** | Open-source, 0.4× multiplier, stable choice for bulk automation and air-gapped environments; no image support. | +| 17 | **Droid Core (Kimi K2.6)** | Open-source, 0.4× multiplier with image support and optional High reasoning; good for cost-sensitive work when you still want a thinking toggle. | +| 18 | **Droid Core (Kimi K2.5)** | Open-source, 0.25× multiplier with image support; older Kimi option for cost-sensitive work. | We ship model updates regularly. When a new release overtakes the list above, we update this page and the @@ -57,17 +60,18 @@ Use this as a qualitative starting point. For the full catalog — multipliers, | Scenario | Recommended model | | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Opus 4.8** for best depth and safety, or **Opus 4.7** / **Opus 4.8 Fast** for faster turnaround. Use **Sonnet 4.6** when you want balanced cost/quality, or **GPT-5.5** / **GPT-5.4** for large-context reasoning. | -| **Full-feature development, large refactors** | **Opus 4.8** or **Opus 4.7** for depth and safety. **GPT-5.5**, **GPT-5.4**, or **GPT-5.3-Codex** when you want speed plus high-effort reasoning; **Sonnet 4.6** for balanced loops. | -| **Repeatable edits, summarization, boilerplate generation** | **Haiku 4.5** or **Droid Core** for speed and cost. **GPT-5.4 Mini** or **GPT-5.2** when you need higher quality or structured outputs. | +| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Opus 4.7** for best depth and safety (1× promotional multiplier through April 30), or fall back to **Opus 4.6** / **Opus 4.6 Fast** for faster turnaround. Use **Sonnet 4.6** or **Sonnet 4.5** when you want balanced cost/quality, or **GPT-5.4** for large-context reasoning. | +| **Full-feature development, large refactors** | **Opus 4.7** or **Opus 4.6** for depth and safety. **GPT-5.4**, **GPT-5.3-Codex**, or **GPT-5.2-Codex** when you need speed plus **Extra High** reasoning; **Sonnet 4.6** or **Sonnet 4.5** for balanced loops. | +| **Repeatable edits, summarization, boilerplate generation** | **Haiku 4.5** or **Droid Core** (including **MiniMax M2.7** at 0.12×) for speed and cost. **GPT-5.2** when you need higher quality or structured outputs. | | **CI/CD or automation loops** | Favor **Haiku 4.5** or **Droid Core** for predictable, low-cost throughput. Use **GPT-5.3-Codex** or **GPT-5.4** when automation needs stronger reasoning. | -| **High-volume automation, frequent quick turns** | **Haiku 4.5** or **Gemini Flash** for speedy feedback. **Droid Core** when cost is critical or you need air-gapped deployment. | +| **High-volume automation, frequent quick turns** | **Haiku 4.5** for speedy feedback. **Droid Core** (especially **MiniMax M2.7** at 0.12× with reasoning) when cost is critical or you need air-gapped deployment. | - **Claude Opus 4.8** is the newest top-tier option for extremely complex architecture decisions or critical - work where you need maximum reasoning capability. **Opus 4.7** remains an excellent alternative, and **Opus - 4.8 Fast** is tuned for faster responses. Most tasks don't require Opus-level power—start with **Sonnet 4.6** - and escalate only if needed. + **Claude Opus 4.7** is the newest top-tier option for extremely complex architecture decisions or critical + work where you need maximum reasoning capability—and it runs at a promotional 1× multiplier through April 30 + (2× afterward). **Claude Opus 4.6** remains an excellent alternative, and **Opus 4.6 Fast** is tuned for + faster responses at a higher cost. Most tasks don't require Opus-level power—start with Sonnet 4.6 or Sonnet + 4.5 and escalate only if needed. Tip: you can swap models mid-session with `/model` or by toggling in the settings panel (`Shift+Tab` → **Settings**). @@ -85,7 +89,23 @@ Tip: you can swap models mid-session with `/model` or by toggling in the setting ## 4 · Reasoning effort settings -Higher reasoning effort increases latency and cost—start low for simple work and escalate as needed. The levels each model supports, and its default, are listed in the **Reasoning** column of [Available Models](/models). To set reasoning effort in `settings.json`—including how display labels like **Extra High** map to values such as `xhigh`—see [Reasoning effort](/cli/configuration/settings#reasoning-effort). +- **Opus 4.7**: Off / Low / Medium / High / **Max** (default: High) +- **Opus 4.6 / Opus 4.6 Fast**: Off / Low / Medium / High / **Max** (default: High) +- **Sonnet 4.6**: Off / Low / Medium / High / **Max** (default: High) +- **Opus 4.5 / Sonnet 4.5 / Haiku 4.5**: Off / Low / Medium / High (default: Off) +- **GPT-5.4**: None / Low / Medium / High / **Extra High** (default: Medium) +- **GPT-5.2**: Off / Low / Medium / High / **Extra High** (default: Low) +- **GPT-5.2-Codex**: None / Low / Medium / High / **Extra High** (default: Medium) +- **GPT-5.3-Codex**: None / Low / Medium / High / **Extra High** (default: Medium) +- **Gemini 3.1 Pro**: Low / Medium / High (default: High) +- **Gemini 3 Flash**: Minimal / Low / Medium / High (default: High) +- **Droid Core (GLM-5)**: None only (default: None; no image support) +- **Droid Core (GLM-5.1)**: None only (default: None; no image support) +- **Droid Core (Kimi K2.6)**: Off / High (default: High) +- **Droid Core (Kimi K2.5)**: None only (default: None) +- **Droid Core (MiniMax M2.7)**: Low / Medium / High (default: High) + +Reasoning effort increases latency and cost—start low for simple work and escalate as needed. **Max** is available on Claude Opus 4.7, the Opus 4.6 family (Opus 4.6 and Opus 4.6 Fast), and Sonnet 4.6. **Extra High** is available on GPT-5.4, GPT-5.2, GPT-5.2-Codex, and GPT-5.3-Codex. Change reasoning effort from `/model` → **Reasoning effort**, or via the settings menu. @@ -97,14 +117,14 @@ Factory ships with managed Anthropic and OpenAI access. If you prefer to run aga ### Open-source models -Factory's **Droid Core** open models (see [Available Models](/models) for the current list) are open-source alternatives available in the CLI. They're useful for: +**Droid Core (GLM-5)**, **Droid Core (GLM-5.1)**, **Droid Core (Kimi K2.6)**, **Droid Core (Kimi K2.5)**, and **Droid Core (MiniMax M2.7)** are open-source alternatives available in the CLI. They're useful for: - **Air-gapped environments** where external API calls aren't allowed - **Cost-sensitive projects** needing unlimited local inference - **Privacy requirements** where code cannot leave your infrastructure - **Experimentation** with open-source model capabilities -**Note:** image support varies across Droid Core models—the GLM family does not accept image attachments, while Kimi and MiniMax do. For image-based workflows, prefer Claude, GPT, Gemini, Kimi, or MiniMax. +**Note:** GLM-5 and GLM-5.1 do not support image attachments. Kimi K2.5, Kimi K2.6, and MiniMax M2.7 do support images. Kimi K2.6 adds an Off/High reasoning toggle, while MiniMax M2.7 (the cheapest model available, with 0.12× multiplier) supports Low/Medium/High reasoning. For image-based workflows, use Claude, GPT, Kimi, or MiniMax M2.7. To use open-source models, you'll need to configure them via BYOK with a local inference server (like Ollama) or a hosted provider. See [BYOK documentation](/cli/byok/overview) for setup instructions. From 7f2091031efebcd937b15aa741925100621f0313 Mon Sep 17 00:00:00 2001 From: TheFactoriousDROID Date: Sun, 7 Jun 2026 20:29:19 +0000 Subject: [PATCH 3/3] docs: correct reasoning-effort notes on settings page Drops the inaccurate claim that `dynamic` is an internal, non-selectable level (no CLI-available model exposes it, so CLI users never encounter it) and notes that the `max` label can render as Max or Maximum depending on surface. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --- docs/cli/configuration/settings.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cli/configuration/settings.mdx b/docs/cli/configuration/settings.mdx index 5f72d17e8..f5d43a8a0 100644 --- a/docs/cli/configuration/settings.mdx +++ b/docs/cli/configuration/settings.mdx @@ -83,7 +83,7 @@ Set `sessionDefaultSettings.model` to a [model ID from Available Models](/models | Extra High | `xhigh` | | Max | `max` | -Higher effort increases latency and cost. Which levels are available — and which is the default — depend on the model, so check the **Reasoning** column in [Available Models](/models). (`dynamic` is an internal adaptive setting and is not selected manually.) +Higher effort increases latency and cost. Which levels are available — and which is the default — depend on the model, so check the **Reasoning** column in [Available Models](/models). Display labels can vary slightly by surface (for example, `max` may appear as **Max** or **Maximum**). ### Autonomy level