garrytan · datashaman · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md
@@ -546,7 +546,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   What alternatives were dismissed too quickly? What competitive or market risks are
   unaddressed? What scope decisions will look foolish in 6 months? Be adversarial.
   No compliments. Just the strategic blind spots.
-  File: <plan_path>" -s read-only --enable web_search_cached`
+  File: <plan_path>" -s read-only --search`
   Timeout: 10 minutes
 
   **Claude CEO subagent** (via Agent tool):
@@ -657,7 +657,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   accessibility requirements (keyboard nav, contrast, touch targets) specified or
   aspirational? Does the plan describe specific UI decisions or generic patterns?
   What design decisions will haunt the implementer if left ambiguous?
-  Be opinionated. No hedging." -s read-only --enable web_search_cached`
+  Be opinionated. No hedging." -s read-only --search`
   Timeout: 10 minutes
 
   **Claude design subagent** (via Agent tool):
@@ -722,7 +722,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   CEO: <insert CEO consensus table summary — key concerns, DISAGREEs>
   Design: <insert Design consensus table summary, or 'skipped, no UI scope'>
 
-  File: <plan_path>" -s read-only --enable web_search_cached`
+  File: <plan_path>" -s read-only --search`
   Timeout: 10 minutes
 
   **Claude eng subagent** (via Agent tool):
@@ -737,7 +737,6 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   NO prior-phase context — subagent must be truly independent.
 
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
-
 - Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
 - Evals: always include all relevant suites (P1)
 - Test plan: generate artifact at `~/.gstack/projects/$SLUG/{user}-{branch}-test-plan-{datetime}.md`

diff --git a/autoplan/SKILL.md.tmpl b/autoplan/SKILL.md.tmpl
@@ -203,7 +203,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   What alternatives were dismissed too quickly? What competitive or market risks are
   unaddressed? What scope decisions will look foolish in 6 months? Be adversarial.
   No compliments. Just the strategic blind spots.
-  File: <plan_path>" -s read-only --enable web_search_cached`
+  File: <plan_path>" -s read-only --search`
   Timeout: 10 minutes
 
   **Claude CEO subagent** (via Agent tool):
@@ -314,7 +314,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   accessibility requirements (keyboard nav, contrast, touch targets) specified or
   aspirational? Does the plan describe specific UI decisions or generic patterns?
   What design decisions will haunt the implementer if left ambiguous?
-  Be opinionated. No hedging." -s read-only --enable web_search_cached`
+  Be opinionated. No hedging." -s read-only --search`
   Timeout: 10 minutes
 
   **Claude design subagent** (via Agent tool):
@@ -379,7 +379,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   CEO: <insert CEO consensus table summary — key concerns, DISAGREEs>
   Design: <insert Design consensus table summary, or 'skipped, no UI scope'>
 
-  File: <plan_path>" -s read-only --enable web_search_cached`
+  File: <plan_path>" -s read-only --search`
   Timeout: 10 minutes
 
   **Claude eng subagent** (via Agent tool):
@@ -394,7 +394,6 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   NO prior-phase context — subagent must be truly independent.
 
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
-
 - Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
 - Evals: always include all relevant suites (P1)
 - Test plan: generate artifact at `~/.gstack/projects/$SLUG/{user}-{branch}-test-plan-{datetime}.md`

diff --git a/bun.lockb b/bun.lockb
diff --git a/codex/SKILL.md b/codex/SKILL.md
@@ -373,13 +373,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
 
 2. Run the review (5-minute timeout):
 ```bash
-codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex exec review --base <base> -c 'model_reasoning_effort="high"' --json 2>"$TMPERR"
 ```
 
 Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 (e.g., `/codex review focus on security`), pass them as the prompt argument:
 ```bash
-codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex exec review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --json 2>"$TMPERR"
 ```
 
 3. Capture the output. Then parse cost from stderr:
@@ -517,7 +517,7 @@ With focus (e.g., "security"):
 
 2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c "
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>/dev/null | python3 -c "
 import sys, json
 for line in sys.stdin:
     line = line.strip()
@@ -602,7 +602,7 @@ THE PLAN:
 
 For a **new session:**
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>"$TMPERR" | python3 -c "
 import sys, json
 for line in sys.stdin:
     line = line.strip()
@@ -635,7 +635,7 @@ for line in sys.stdin:
 
 For a **resumed session** (user chose "Continue"):
 ```bash
-codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>"$TMPERR" | python3 -c "
 <same python streaming parser as above>
 "
 ```
@@ -671,10 +671,10 @@ Session saved — run /codex again to continue this conversation.
 agentic coding model). This means as OpenAI ships newer models, /codex automatically
 uses them. If the user wants a specific model, pass `-m` through to codex.
 
-**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
+**Reasoning effort:** All modes use `high` — the maximum reasoning power available. Valid values are `minimal`, `low`, `medium`, `high`. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
 
-**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
-docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
+**Web search:** Challenge and consult modes use `--search` so Codex can look up
+docs and APIs. Review mode (`codex exec review`) does not need `--search` — it has its own built-in review prompt.
 
 If the user specifies a model (e.g., `/codex review -m gpt-5.1-codex-max`
 or `/codex challenge -m gpt-5.2`), pass the `-m` flag through to codex.

diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl
@@ -79,13 +79,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
 
 2. Run the review (5-minute timeout):
 ```bash
-codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex exec review --base <base> -c 'model_reasoning_effort="high"' --json 2>"$TMPERR"
 ```
 
 Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 (e.g., `/codex review focus on security`), pass them as the prompt argument:
 ```bash
-codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
+codex exec review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --json 2>"$TMPERR"
 ```
 
 3. Capture the output. Then parse cost from stderr:
@@ -158,7 +158,7 @@ With focus (e.g., "security"):
 
 2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c "
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>/dev/null | python3 -c "
 import sys, json
 for line in sys.stdin:
     line = line.strip()
@@ -243,7 +243,7 @@ THE PLAN:
 
 For a **new session:**
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>"$TMPERR" | python3 -c "
 import sys, json
 for line in sys.stdin:
     line = line.strip()
@@ -276,7 +276,7 @@ for line in sys.stdin:
 
 For a **resumed session** (user chose "Continue"):
 ```bash
-codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
+codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search --json 2>"$TMPERR" | python3 -c "
 <same python streaming parser as above>
 "
 ```
@@ -312,10 +312,10 @@ Session saved — run /codex again to continue this conversation.
 agentic coding model). This means as OpenAI ships newer models, /codex automatically
 uses them. If the user wants a specific model, pass `-m` through to codex.
 
-**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
+**Reasoning effort:** All modes use `high` — the maximum reasoning power available. Valid values are `minimal`, `low`, `medium`, `high`. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
 
-**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
-docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
+**Web search:** Challenge and consult modes use `--search` so Codex can look up
+docs and APIs. Review mode (`codex exec review`) does not need `--search` — it has its own built-in review prompt.
 
 If the user specifies a model (e.g., `/codex review -m gpt-5.1-codex-max`
 or `/codex challenge -m gpt-5.2`), pass the `-m` flag through to codex.

diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md
@@ -453,7 +453,7 @@ codex exec "Given this product context, propose a complete design direction:
 - Differentiation: 2 deliberate departures from category norms
 - Anti-slop: no purple gradients, no 3-column icon grids, no centered everything, no decorative blobs
 
-Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_DESIGN"
+Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it." -s read-only -c 'model_reasoning_effort="medium"' --search 2>"$TMPERR_DESIGN"
 ```
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
 ```bash

diff --git a/design-review/SKILL.md b/design-review/SKILL.md
@@ -993,7 +993,7 @@ HARD REJECTION — flag if ANY apply:
 6. Carousel with no narrative purpose
 7. App UI made of stacked cards instead of layout
 
-Be specific. Reference file:line for every finding." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
+Be specific. Reference file:line for every finding." -s read-only -c 'model_reasoning_effort="high"' --search 2>"$TMPERR_DESIGN"
 ```
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
 ```bash

diff --git a/docs/skills.md b/docs/skills.md
@@ -716,9 +716,9 @@ When `/review` catches bugs from Claude's perspective, `/codex` brings a complet
 
 ### Three modes
 
-**Review** — run `codex review` against the current diff. Codex reads every changed file, classifies findings by severity (P1 critical, P2 high, P3 medium), and returns a PASS/FAIL verdict. Any P1 finding = FAIL. The review is fully independent — Codex doesn't see Claude's review.
+**Review** — run `/codex review` against the current diff. Codex reads every changed file, classifies findings by severity (P1 critical, P2 high, P3 medium), and returns a PASS/FAIL verdict. Any P1 finding = FAIL. The review is fully independent — Codex doesn't see Claude's review.
 
-**Challenge** — adversarial mode. Codex actively tries to break your code. It looks for edge cases, race conditions, security holes, and assumptions that would fail under load. Uses maximum reasoning effort (`xhigh`). Think of it as a penetration test for your logic.
+**Challenge** — adversarial mode. Codex actively tries to break your code. It looks for edge cases, race conditions, security holes, and assumptions that would fail under load. Uses maximum reasoning effort (`high`). Think of it as a penetration test for your logic.
 
 **Consult** — open conversation with session continuity. Ask Codex anything about the codebase. Follow-up questions reuse the same session, so context carries over. Great for "am I thinking about this correctly?" moments.
 

diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md
@@ -687,7 +687,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
 
 ```bash
 TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
-codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH"
+codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"' --search 2>"$TMPERR_OH"
 ```
 
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
@@ -838,7 +838,7 @@ If user chooses A, launch both voices simultaneously:
 1. **Codex** (via Bash, `model_reasoning_effort="medium"`):
 ```bash
 TMPERR_SKETCH=$(mktemp /tmp/codex-sketch-XXXXXXXX)
-codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
+codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -s read-only -c 'model_reasoning_effort="medium"' --search 2>"$TMPERR_SKETCH"
 ```
 Use a 5-minute timeout (`timeout: 300000`). After completion: `cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"`
 

diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md
@@ -1044,7 +1044,7 @@ THE PLAN:
 
 ```bash
 TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search 2>"$TMPERR_PV"
 ```
 
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:

diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md
@@ -467,7 +467,7 @@ HARD RULES — first classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, the
 - APP UI: Calm surface hierarchy, dense but readable, utility language, minimal chrome
 - UNIVERSAL: CSS variables for colors, no default font stacks, one job per section, cards earn existence
 
-For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
+For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging." -s read-only -c 'model_reasoning_effort="high"' --search 2>"$TMPERR_DESIGN"
 ```
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
 ```bash

diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md
@@ -722,7 +722,7 @@ THE PLAN:
 
 ```bash
 TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --search 2>"$TMPERR_PV"
 ```
 
 Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: