Add first-class codex executor mode#350
Conversation
Plan for promoting CodexExecutor from external-review-only to a peer of ClaudeExecutor across task, review, and finalize phases. Adds --codex and --pass-claude-md flags, role-named Executors struct (Task/Review/External/Custom), codex-flavored review prompts, and additive -c flag overrides that preserve user's ~/.codex config. 13 tasks. Survived smells pre-check, plan-review pass, and an interactive revdiff round that pivoted away from the original CODEX_HOME temp-directory approach.
📝 WalkthroughWalkthroughAdds a first-class Codex executor: CLI flags/config fields for executor selection and CLAUDE.md passthrough, codex-specific review prompts and spawn_agent expansion, Codex executor idle/session handling, runner refactor to role-based executors, extensive tests, and documentation updates. ChangesFirst-class Codex Executor Mode
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds first-class --codex executor mode that drives task, internal review, and finalize phases directly through the codex CLI (skipping external review since same-model self-review has weak signal). Motivated by the June 2026 Anthropic billing split that moves unattended ralphex runs off the Claude Max plan; gives OpenAI/codex-plan users a single flag to keep the full pipeline on their existing subscription. Also introduces --pass-claude-md, codex-specific review prompts, role-named executor wiring, and idle/session timeout coverage for codex.
Changes:
- New
--codexand--pass-claude-mdCLI flags plus matchingexecutor/pass_claude_mdconfig keys, with mutex validation against--external-only,--codex-only, and--external-review-tool != none. - Refactor
Executorsfrom executor-named (Claude/ReviewClaude/Codex) to role-named (Task/Review/External/Custom); add codex agent-expansion viaspawn_agent, additive-coverrides, idle/session-timeout support forCodexExecutor, executor-neutral internal-review section labels. - Docs/prompts: codex-mode README section, new
review_first_codex.txt/review_second_codex.txtdefaults, CLAUDE.md/llms.txt/custom-providers/plan doc updates.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents --codex, --pass-claude-md, config keys, mutex rules, prompt-customization split, codex 0.130.0 requirement. |
| llms.txt | Mirrors README codex-mode summary for LLM consumers. |
| CLAUDE.md | Adds project-doc notes about codex executor mode. |
| docs/custom-providers.md | Calls out wrapper deprecation in favor of --codex. |
| docs/plans/20260514-codex-first-class-mode.md | Implementation plan/spec for the feature. |
| cmd/ralphex/main.go | Adds flags, codex dep check selection, post-merge applyCodexOverrides with mutex/auto-disable handling. |
| cmd/ralphex/main_test.go | Tests for new flag/config mutex and override paths. |
| pkg/processor/runner.go | Role-named executors, codex executor builders, codex-aware finalize routing, runWithSessionTimeout rework. |
| pkg/processor/prompts.go | Codex-shape agent expansion via spawn_agent, escapeCodexSingleQuoted. |
| pkg/processor/prompts_test.go | Covers codex agent expansion + escaping. |
| pkg/processor/export_test.go | Test exports for new internals. |
| pkg/processor/signals_test.go | Regression tests confirming codex-style free-form output still parses. |
| pkg/executor/codex.go | Adds MultiAgent, PassClaudeMd, IdleTimeout, additive -c args, idle-timeout machinery, touchReader, finalError extraction. |
| pkg/executor/codex_test.go | Tests idle timeout fires/resets, args splicing, finalError surfaces. |
| pkg/executor/executor_test.go | Minor exec test updates for new fields. |
| pkg/config/config.go | New Executor and PassClaudeMd config keys plus constants. |
| pkg/config/values.go | *Set sentinels for new fields. |
| pkg/config/config_test.go, values_test.go, defaults_test.go | Coverage for parsing/merge. |
| pkg/config/prompts.go | ReviewFirstCodexPrompt / ReviewSecondCodexPrompt loader plumbing. |
| pkg/config/defaults/config | Default config additions. |
| pkg/config/defaults/prompts/review_first_codex.txt, review_second_codex.txt | New codex-specific review prompts. |
| pkg/status/section.go | NewInternalReviewSection with executor-neutral label to avoid phaseFromSection misrouting. |
| pkg/status/section_test.go | Asserts label contract and absence of codex in label. |
| pkg/web/session_progress_test.go | Regression for internal-review label routing under codex. |
| pkg/web/server_test.go | Removes stray blank line. |
| pkg/git/service_test.go | Trivial trailing-newline cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if cfg.isCodexExecutor() { | ||
| e.PassClaudeMd = cfg.AppConfig.PassClaudeMd | ||
| } |
| // formatAgentExpansionCodex builds the spawn_agent block used by codex executor. | ||
| // frontmatter Model/AgentType overrides do not apply: codex registers a single | ||
| // reviewer agent globally; per-call behavior is driven by the inlined task body. | ||
| // the agent body is escaped so apostrophes (don't, what's) and backslashes inside | ||
| // it do not terminate the surrounding single-quoted task='...' literal. the agent | ||
| // name is shared with pkg/executor.CodexReviewerAgentName so the spawn_agent call | ||
| // here stays in sync with the codex -c agents.<name>.description registration. | ||
| func (r *Runner) formatAgentExpansionCodex(prompt string) string { | ||
| return fmt.Sprintf(`spawn_agent(agent='%s', task='%s') | ||
|
|
||
| Report findings only - no positive observations.`, executor.CodexReviewerAgentName, r.escapeCodexSingleQuoted(prompt)) | ||
| } |
| // escapeCodexSingleQuoted escapes backslashes, single quotes, and newlines so | ||
| // the result is safe to splice into a Python-style single-quoted string literal | ||
| // (which is what spawn_agent(task='...') expects). backslash must be escaped | ||
| // first so the subsequent escapes do not get re-escaped. newlines and carriage | ||
| // returns are escaped because multi-line agent bodies would otherwise embed raw | ||
| // newlines into the task='...' literal and codex's string parser treats those | ||
| // as terminators of the single-quoted form. | ||
| func (r *Runner) escapeCodexSingleQuoted(s string) string { | ||
| s = strings.ReplaceAll(s, `\`, `\\`) | ||
| s = strings.ReplaceAll(s, `'`, `\'`) | ||
| s = strings.ReplaceAll(s, "\r", `\r`) | ||
| s = strings.ReplaceAll(s, "\n", `\n`) | ||
| return s | ||
| } |
| if o.ExternalOnly { | ||
| return errors.New("--external-only is incompatible with codex executor (external review is skipped in codex mode)") | ||
| } | ||
| if o.CodexOnly { | ||
| return errors.New("--codex-only is incompatible with codex executor (external review is skipped in codex mode)") | ||
| } | ||
| if o.externalReviewToolSet && o.ExternalReviewTool != "none" { | ||
| return errors.New("--external-review-tool is incompatible with codex executor (external review is skipped)") | ||
| } |
| e.IdleTimeout = cfg.AppConfig.IdleTimeout | ||
| if cfg.isCodexExecutor() { |
| // under --codex, finalize is single-agent (no multi_agent overrides) and runs through | ||
| // the task-phase codex instance. under claude, the review executor is fine — it shares | ||
| // the claude binary and may carry a different model selection for review-style work. | ||
| execName := r.executorName() | ||
| finalizeRun := r.review.Run | ||
| if r.isCodexExecutor() { | ||
| finalizeRun = r.task.Run | ||
| } | ||
| result := r.runWithLimitRetry(ctx, finalizeRun, prompt, execName) |
Deploying ralphex with
|
| Latest commit: |
64d51f3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://970fdbdd.ralphex.pages.dev |
| Branch Preview URL: | https://codex-first-class-mode.ralphex.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
CLAUDE.md (1)
74-74: 💤 Low valueConsider breaking up the very long paragraph for improved readability.
The single-paragraph explanation of
--codexmode (line 74) packs 8+ distinct concepts into one paragraph: executor switch semantics, external review forcing, pipeline mode preservation, Executors struct fields, agent registration,-cflag overrides, config preservation, PassClaudeMd behavior, and user-level CLAUDE.md hints. While technically accurate, breaking this into 2-3 focused paragraphs would improve scannability.Suggested structure:
- What
--codexdoes (executor switch, external review skip, mode preservation)- How multi-agent reviews are configured (
-coverrides, config layering)- PassClaudeMd behavior (project vs user-level)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 74, Split the long single paragraph describing the --codex behavior into 2–3 focused paragraphs: first explain the executor switch semantics (the --codex flag sets cfg.Executor = config.ExecutorCodex, forces cfg.ExternalReviewTool = "none", preserves the pipeline Mode enum, and maps Executors struct role fields Task/Review/External/Custom to the underlying CodexExecutor for each phase); second describe multi-agent configuration and agent registration (explain that codexConfigOpts.cliArgs() in pkg/executor/codex.go adds additive -c overrides like features.multi_agent and agents.reviewer.description on top of ~/.codex/config.toml so user model/sandbox/MCP settings are preserved); third cover PassClaudeMd behavior and CLAUDE.md hints (document that --pass-claude-md sets CodexExecutor.PassClaudeMd = true to pass project CLAUDE.md via -c project_doc_fallback_filenames and that the runner prints a one-time hint for user-level CLAUDE.md symlink without writing to ~/.codex).cmd/ralphex/main_test.go (1)
961-1152: 🏗️ Heavy liftRefactor repetitive codex override tests into table-driven cases.
TestApplyCodexOverrides_PostMergeMutexChecksandTestCodexFlag_ApplyCLIOverridesrepeat setup/assert flows heavily; converting these to table-driven cases with shared setup helpers will reduce duplication and make new matrix cases safer to add.As per coding guidelines, "
**/*_test.go: Table-driven tests with testify".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/ralphex/main_test.go` around lines 961 - 1152, Replace the many duplicated t.Run blocks in TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides with table-driven loops: define a slice of test cases (fields: name, cliArgs []string, initial cfg *config.Config, wantErr bool, wantErrContains []string, wantWarnContains []string, wantCfgChecks funcs or expected field values) and iterate with t.Run(name, func(t *testing.T){ setup common vars (o := parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer), call applyCodexOverrides or applyCLIOverrides as appropriate, then assert expecteds (require.Error/NoError, strings.Contains on err.Error, warnBuf.String(), and cfg fields like Executor, ExternalReviewTool, PassClaudeMd) to eliminate copy/paste; keep helper calls parseTestOpts, applyCodexOverrides, applyCLIOverrides, and warnBuf usage from the existing tests to preserve behavior and warning checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/ralphex/main_test.go`:
- Around line 1101-1103: Update the stale test comment to reflect that
validateFlags now accepts these flag combinations in TestValidateFlags rather
than rejecting them; clarify that applyCLIOverrides is still called after
validateFlags and the guard exists to cover the no-warning branch when a user
explicitly set the flag (which used to be a hard error pre-applyCLIOverrides).
Replace the existing text referencing rejection by validateFlags with a line
stating that TestValidateFlags intentionally accepts the combo at CLI stage and
that applyCLIOverrides ensures the no-warning branch when the flag was
explicitly provided.
In `@cmd/ralphex/main.go`:
- Around line 1380-1382: The warning logic is inverted: it warns when
external_review_tool is the embedded default (cfg.ExternalReviewTool set but
o.externalReviewToolSet false). Change the condition so the message is emitted
only when the user explicitly set external_review_tool and it is being
overridden; i.e., replace the current check (cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet) with a check that
requires o.externalReviewToolSet to be true (for example:
o.externalReviewToolSet && cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call
using cfg.ExternalReviewTool.
In `@pkg/processor/runner_test.go`:
- Around line 4374-4379: setIsolatedHome currently only sets HOME; on Windows
os.UserHomeDir() will read USERPROFILE (or HOMEDRIVE+HOMEPATH), so update
setIsolatedHome to also set USERPROFILE to the same t.TempDir() and populate
HOMEDRIVE and HOMEPATH so UserHomeDir resolves to the temp dir on Windows; keep
the existing call to processor.ResetClaudeMdHintOnceForTest(). Ensure you use
t.Setenv for USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across
platforms.
In `@pkg/status/section_test.go`:
- Around line 9-24: Convert the current hardcoded subtests in
TestReviewSections_TypeAndLabelContract into a table-driven test: create a slice
of test cases with fields (name, constructor args, expected Type, expected
Iteration, expected Label and any NotContains checks), iterate over them with
t.Run(tc.name, func(t *testing.T) { ... }), and for each case call the
appropriate constructor (NewClaudeReviewSection or NewInternalReviewSection) and
assert expectations against section.Type, section.Iteration and section.Label
(including the NotContains check for "codex"); keep the test name
TestReviewSections_TypeAndLabelContract and reuse the symbols
NewClaudeReviewSection, NewInternalReviewSection and SectionClaudeReview so
assertions remain the same.
---
Nitpick comments:
In `@CLAUDE.md`:
- Line 74: Split the long single paragraph describing the --codex behavior into
2–3 focused paragraphs: first explain the executor switch semantics (the --codex
flag sets cfg.Executor = config.ExecutorCodex, forces cfg.ExternalReviewTool =
"none", preserves the pipeline Mode enum, and maps Executors struct role fields
Task/Review/External/Custom to the underlying CodexExecutor for each phase);
second describe multi-agent configuration and agent registration (explain that
codexConfigOpts.cliArgs() in pkg/executor/codex.go adds additive -c overrides
like features.multi_agent and agents.reviewer.description on top of
~/.codex/config.toml so user model/sandbox/MCP settings are preserved); third
cover PassClaudeMd behavior and CLAUDE.md hints (document that --pass-claude-md
sets CodexExecutor.PassClaudeMd = true to pass project CLAUDE.md via -c
project_doc_fallback_filenames and that the runner prints a one-time hint for
user-level CLAUDE.md symlink without writing to ~/.codex).
In `@cmd/ralphex/main_test.go`:
- Around line 961-1152: Replace the many duplicated t.Run blocks in
TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides
with table-driven loops: define a slice of test cases (fields: name, cliArgs
[]string, initial cfg *config.Config, wantErr bool, wantErrContains []string,
wantWarnContains []string, wantCfgChecks funcs or expected field values) and
iterate with t.Run(name, func(t *testing.T){ setup common vars (o :=
parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer), call
applyCodexOverrides or applyCLIOverrides as appropriate, then assert expecteds
(require.Error/NoError, strings.Contains on err.Error, warnBuf.String(), and cfg
fields like Executor, ExternalReviewTool, PassClaudeMd) to eliminate copy/paste;
keep helper calls parseTestOpts, applyCodexOverrides, applyCLIOverrides, and
warnBuf usage from the existing tests to preserve behavior and warning checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c0f7f1d1-b825-4663-9572-cf551d721c72
📒 Files selected for processing (30)
CLAUDE.mdREADME.mdcmd/ralphex/main.gocmd/ralphex/main_test.godocs/custom-providers.mddocs/plans/20260514-codex-first-class-mode.mdllms.txtpkg/config/config.gopkg/config/config_test.gopkg/config/defaults/configpkg/config/defaults/prompts/review_first_codex.txtpkg/config/defaults/prompts/review_second_codex.txtpkg/config/defaults_test.gopkg/config/prompts.gopkg/config/values.gopkg/config/values_test.gopkg/executor/codex.gopkg/executor/codex_test.gopkg/executor/executor_test.gopkg/git/service_test.gopkg/processor/export_test.gopkg/processor/prompts.gopkg/processor/prompts_test.gopkg/processor/runner.gopkg/processor/runner_test.gopkg/processor/signals_test.gopkg/status/section.gopkg/status/section_test.gopkg/web/server_test.gopkg/web/session_progress_test.go
💤 Files with no reviewable changes (2)
- pkg/web/server_test.go
- pkg/git/service_test.go
| // CLI validation would reject this combo via validateFlags, but applyCLIOverrides | ||
| // is called after validateFlags; this guards the no-warning branch when user | ||
| // explicitly set the flag (which would have been a hard error pre-applyCLIOverrides). |
There was a problem hiding this comment.
Update stale behavior comment in this test block.
Line 1101 says this combo is rejected by validateFlags, but the new TestValidateFlags cases intentionally accept these combinations at CLI stage. This comment now misstates actual behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/ralphex/main_test.go` around lines 1101 - 1103, Update the stale test
comment to reflect that validateFlags now accepts these flag combinations in
TestValidateFlags rather than rejecting them; clarify that applyCLIOverrides is
still called after validateFlags and the guard exists to cover the no-warning
branch when a user explicitly set the flag (which used to be a hard error
pre-applyCLIOverrides). Replace the existing text referencing rejection by
validateFlags with a line stating that TestValidateFlags intentionally accepts
the combo at CLI stage and that applyCLIOverrides ensures the no-warning branch
when the flag was explicitly provided.
| if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet { | ||
| fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool) | ||
| } |
There was a problem hiding this comment.
Suppress misleading codex-mode warning on defaults.
At Line 1380, this warning fires for the embedded default external_review_tool = "codex", so codex mode can print a false “config-file overridden” warning even when the user didn’t set a conflicting value.
💡 Suggested fix
- if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet {
+ if cfg.ExternalReviewTool == "custom" && !o.externalReviewToolSet {
fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/ralphex/main.go` around lines 1380 - 1382, The warning logic is inverted:
it warns when external_review_tool is the embedded default
(cfg.ExternalReviewTool set but o.externalReviewToolSet false). Change the
condition so the message is emitted only when the user explicitly set
external_review_tool and it is being overridden; i.e., replace the current check
(cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" &&
!o.externalReviewToolSet) with a check that requires o.externalReviewToolSet to
be true (for example: o.externalReviewToolSet && cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call
using cfg.ExternalReviewTool.
| func setIsolatedHome(t *testing.T) string { | ||
| t.Helper() | ||
| home := t.TempDir() | ||
| t.Setenv("HOME", home) | ||
| processor.ResetClaudeMdHintOnceForTest() | ||
| return home |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go's os.UserHomeDirdocumentation, which environment variables are used on Windows, and does setting onlyHOME isolate the returned home directory there?
💡 Result:
According to the official Go documentation, the os.UserHomeDir function returns the %USERPROFILE% environment variable when running on Windows [1][2]. Setting only the HOME environment variable on Windows does not isolate or influence the directory returned by os.UserHomeDir, as the function specifically looks for %USERPROFILE% on that platform [1][2]. While some older or third-party implementations of home directory detection historically combined %HOMEDRIVE% and %HOMEPATH% or checked other variables, the standard library's os.UserHomeDir implementation is explicitly documented to rely on %USERPROFILE% [1][2].
Citations:
🏁 Script executed:
#!/bin/bash
# Search for os.UserHomeDir calls in processor code
echo "=== Searching for UserHomeDir calls ==="
rg "UserHomeDir|user\.HomeDir|homedir" pkg/processor --type go
# Search for environment variable usage in home/path resolution
echo ""
echo "=== Searching for USERPROFILE/HOMEDRIVE/HOMEPATH usage ==="
rg "USERPROFILE|HOMEDRIVE|HOMEPATH" pkg/processor --type go
# Look for hint path resolution logic
echo ""
echo "=== Searching for CLAUDE.md or hint path logic ==="
rg "CLAUDE\.md|hint.*path|hint.*home" pkg/processor --type go -iRepository: umputun/ralphex
Length of output: 2181
🏁 Script executed:
#!/bin/bash
# Check what the setIsolatedHome helper is actually used for
echo "=== Usages of setIsolatedHome ==="
rg "setIsolatedHome" pkg/processor/runner_test.go -A 2 -B 2
# Look at the hint tests that use this helper
echo ""
echo "=== Context around hint tests (lines 4382-4410) ==="
sed -n '4382,4410p' pkg/processor/runner_test.goRepository: umputun/ralphex
Length of output: 3079
🏁 Script executed:
#!/bin/bash
# Get context around the UserHomeDir call
echo "=== Context of os.UserHomeDir() call ==="
rg "os\.UserHomeDir" pkg/processor/runner.go -B 5 -A 5
# Check the maybeEmitClaudeMdSetupHint function signature
echo ""
echo "=== maybeEmitClaudeMdSetupHint function ==="
rg "func.*maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -A 15Repository: umputun/ralphex
Length of output: 1133
🏁 Script executed:
#!/bin/bash
# Verify this function is called during test execution
echo "=== Searching for maybeEmitClaudeMdSetupHint calls ==="
rg "maybeEmitClaudeMdSetupHint" pkg/processor --type go
# Check if processor.New() calls it
echo ""
echo "=== Context of processor.New() ==="
rg "func \(r \*Runner\) New\(" pkg/processor/runner.go -A 20 | head -40Repository: umputun/ralphex
Length of output: 449
🏁 Script executed:
#!/bin/bash
# Find where maybeEmitClaudeMdSetupHint is called in runner.go
echo "=== Calls to maybeEmitClaudeMdSetupHint in runner.go ==="
rg "maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -B 10 -A 2
# Check if it's called during New() or similar initialization
echo ""
echo "=== Check New() function for hint call ==="
rg "func \(.*\) New\(" pkg/processor/runner.go -A 40 | head -60Repository: umputun/ralphex
Length of output: 1375
Isolate home-directory lookups on Windows too.
Line 4377 only overrides HOME. On Windows, os.UserHomeDir() returns %USERPROFILE%, not HOME, so these tests will access the real user home directory instead of t.TempDir() on Windows, violating the guideline that tests MUST use t.TempDir() and NEVER touch ~/.config/ralphex/. Set the Windows environment variables to ensure consistent test isolation across platforms.
🔧 Minimal hardening
func setIsolatedHome(t *testing.T) string {
t.Helper()
home := t.TempDir()
t.Setenv("HOME", home)
+ t.Setenv("USERPROFILE", home)
+ t.Setenv("HOMEDRIVE", "")
+ t.Setenv("HOMEPATH", "")
processor.ResetClaudeMdHintOnceForTest()
return home
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/processor/runner_test.go` around lines 4374 - 4379, setIsolatedHome
currently only sets HOME; on Windows os.UserHomeDir() will read USERPROFILE (or
HOMEDRIVE+HOMEPATH), so update setIsolatedHome to also set USERPROFILE to the
same t.TempDir() and populate HOMEDRIVE and HOMEPATH so UserHomeDir resolves to
the temp dir on Windows; keep the existing call to
processor.ResetClaudeMdHintOnceForTest(). Ensure you use t.Setenv for
USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across platforms.
| func TestReviewSections_TypeAndLabelContract(t *testing.T) { | ||
| t.Run("claude first pass review", func(t *testing.T) { | ||
| section := NewClaudeReviewSection(0, ": all findings") | ||
| assert.Equal(t, SectionClaudeReview, section.Type) | ||
| assert.Equal(t, 0, section.Iteration) | ||
| assert.Equal(t, "claude review 0: all findings", section.Label) | ||
| }) | ||
|
|
||
| t.Run("internal codex review", func(t *testing.T) { | ||
| section := NewInternalReviewSection(3, ": critical/major") | ||
| assert.Equal(t, SectionClaudeReview, section.Type) | ||
| assert.Equal(t, 3, section.Iteration) | ||
| assert.Equal(t, "review 3: critical/major", section.Label) | ||
| assert.NotContains(t, section.Label, "codex") | ||
| }) | ||
| } |
There was a problem hiding this comment.
Convert this test to a table-driven form.
Line 9 currently uses hardcoded subtests; project test conventions require table-driven structure for consistency and easier extension.
♻️ Proposed refactor
func TestReviewSections_TypeAndLabelContract(t *testing.T) {
- t.Run("claude first pass review", func(t *testing.T) {
- section := NewClaudeReviewSection(0, ": all findings")
- assert.Equal(t, SectionClaudeReview, section.Type)
- assert.Equal(t, 0, section.Iteration)
- assert.Equal(t, "claude review 0: all findings", section.Label)
- })
-
- t.Run("internal codex review", func(t *testing.T) {
- section := NewInternalReviewSection(3, ": critical/major")
- assert.Equal(t, SectionClaudeReview, section.Type)
- assert.Equal(t, 3, section.Iteration)
- assert.Equal(t, "review 3: critical/major", section.Label)
- assert.NotContains(t, section.Label, "codex")
- })
+ tests := []struct {
+ name string
+ section Section
+ wantType SectionType
+ wantIteration int
+ wantLabel string
+ noCodex bool
+ }{
+ {
+ name: "claude first pass review",
+ section: NewClaudeReviewSection(0, ": all findings"),
+ wantType: SectionClaudeReview,
+ wantIteration: 0,
+ wantLabel: "claude review 0: all findings",
+ },
+ {
+ name: "internal codex review",
+ section: NewInternalReviewSection(3, ": critical/major"),
+ wantType: SectionClaudeReview,
+ wantIteration: 3,
+ wantLabel: "review 3: critical/major",
+ noCodex: true,
+ },
+ }
+
+ for _, tc := range tests {
+ t.Run(tc.name, func(t *testing.T) {
+ assert.Equal(t, tc.wantType, tc.section.Type)
+ assert.Equal(t, tc.wantIteration, tc.section.Iteration)
+ assert.Equal(t, tc.wantLabel, tc.section.Label)
+ if tc.noCodex {
+ assert.NotContains(t, tc.section.Label, "codex")
+ }
+ })
+ }
}As per coding guidelines "**/*_test.go: Table-driven tests with testify".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestReviewSections_TypeAndLabelContract(t *testing.T) { | |
| t.Run("claude first pass review", func(t *testing.T) { | |
| section := NewClaudeReviewSection(0, ": all findings") | |
| assert.Equal(t, SectionClaudeReview, section.Type) | |
| assert.Equal(t, 0, section.Iteration) | |
| assert.Equal(t, "claude review 0: all findings", section.Label) | |
| }) | |
| t.Run("internal codex review", func(t *testing.T) { | |
| section := NewInternalReviewSection(3, ": critical/major") | |
| assert.Equal(t, SectionClaudeReview, section.Type) | |
| assert.Equal(t, 3, section.Iteration) | |
| assert.Equal(t, "review 3: critical/major", section.Label) | |
| assert.NotContains(t, section.Label, "codex") | |
| }) | |
| } | |
| func TestReviewSections_TypeAndLabelContract(t *testing.T) { | |
| tests := []struct { | |
| name string | |
| section Section | |
| wantType SectionType | |
| wantIteration int | |
| wantLabel string | |
| noCodex bool | |
| }{ | |
| { | |
| name: "claude first pass review", | |
| section: NewClaudeReviewSection(0, ": all findings"), | |
| wantType: SectionClaudeReview, | |
| wantIteration: 0, | |
| wantLabel: "claude review 0: all findings", | |
| }, | |
| { | |
| name: "internal codex review", | |
| section: NewInternalReviewSection(3, ": critical/major"), | |
| wantType: SectionClaudeReview, | |
| wantIteration: 3, | |
| wantLabel: "review 3: critical/major", | |
| noCodex: true, | |
| }, | |
| } | |
| for _, tc := range tests { | |
| t.Run(tc.name, func(t *testing.T) { | |
| assert.Equal(t, tc.wantType, tc.section.Type) | |
| assert.Equal(t, tc.wantIteration, tc.section.Iteration) | |
| assert.Equal(t, tc.wantLabel, tc.section.Label) | |
| if tc.noCodex { | |
| assert.NotContains(t, tc.section.Label, "codex") | |
| } | |
| }) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/status/section_test.go` around lines 9 - 24, Convert the current
hardcoded subtests in TestReviewSections_TypeAndLabelContract into a
table-driven test: create a slice of test cases with fields (name, constructor
args, expected Type, expected Iteration, expected Label and any NotContains
checks), iterate over them with t.Run(tc.name, func(t *testing.T) { ... }), and
for each case call the appropriate constructor (NewClaudeReviewSection or
NewInternalReviewSection) and assert expectations against section.Type,
section.Iteration and section.Label (including the NotContains check for
"codex"); keep the test name TestReviewSections_TypeAndLabelContract and reuse
the symbols NewClaudeReviewSection, NewInternalReviewSection and
SectionClaudeReview so assertions remain the same.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/plans/20260514-codex-first-class-mode.md`:
- Line 220: Add a language identifier to the fenced code block opened with
triple backticks so MD040 warnings stop; locate the anonymous code fence (the
``` block shown in the diff) and change the opening fence from ``` to ```text
(or another appropriate language) so the block becomes ```text ... ``` as in the
proposed fix.
In `@llms.txt`:
- Around line 164-165: Update the prompt filename list section in llms.txt to
include the codex-specific filenames so it matches the "codex split" docs: add
review_first_codex.txt and review_second_codex.txt alongside the existing
review_first.txt/review_second.txt entries and keep the stated loading priority
(local → global → embedded) intact; ensure the explanatory text clarifies that
customizing review_first.txt does not affect --codex, and users must create the
review_first_codex.txt/review_second_codex.txt variants for codex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 79edc3e5-d145-49b1-a905-91148cc99cf7
📒 Files selected for processing (16)
CLAUDE.mdREADME.mdcmd/ralphex/main.godocs/custom-providers.mddocs/plans/20260514-codex-first-class-mode.mdllms.txtpkg/config/config.gopkg/config/config_test.gopkg/config/defaults/agents/simplification.txtpkg/config/defaults/configpkg/config/values.gopkg/config/values_test.gopkg/executor/codex.gopkg/executor/codex_test.gopkg/processor/runner.gopkg/processor/runner_test.go
✅ Files skipped from review due to trivial changes (2)
- pkg/config/defaults/agents/simplification.txt
- README.md
🚧 Files skipped from review as they are similar to previous changes (10)
- CLAUDE.md
- docs/custom-providers.md
- pkg/config/values_test.go
- pkg/config/defaults/config
- pkg/config/config_test.go
- pkg/executor/codex_test.go
- cmd/ralphex/main.go
- pkg/executor/codex.go
- pkg/processor/runner.go
- pkg/processor/runner_test.go
|
|
||
| **Codex `-c` flag args** (built per invocation by `codexConfigOpts.cliArgs()`): | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
Line 220 opens a fenced block without a language, which triggers MD040 warnings.
📝 Proposed fix
-```
+```text
review phases: -c features.multi_agent=true \
-c agents.reviewer.description="general code review specialist; behavior driven by the task argument"
+ when --pass-claude-md: -c project_doc_fallback_filenames=["CLAUDE.md"]
task / finalize phases: (no overrides — single-agent, default config)
+ when --pass-claude-md: -c project_doc_fallback_filenames=["CLAUDE.md"]</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 220-220: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @docs/plans/20260514-codex-first-class-mode.md at line 220, Add a language
identifier to the fenced code block opened with triple backticks so MD040
warnings stop; locate the anonymous code fence (the block shown in the diff) and change the opening fence from to text (or another appropriate language) so the block becomes text ... ``` as in the proposed fix.
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
| Prompt customization split: customizing `~/.config/ralphex/prompts/review_first.txt` does NOT apply under `--codex` because the codex path reads `review_first_codex.txt`. Users who want a customized codex review prompt need to create `review_first_codex.txt` (and `review_second_codex.txt`) alongside the claude variants. Same loading priority: local → global → embedded. | ||
|
|
There was a problem hiding this comment.
Keep the prompt filename list consistent with the codex split docs.
The new split explanation is good, but the prompt list on Line 117 should also include review_first_codex.txt and review_second_codex.txt to avoid conflicting guidance.
📝 Proposed fix (outside this hunk)
-**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt`
+**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `review_first_codex.txt`, `review_second_codex.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@llms.txt` around lines 164 - 165, Update the prompt filename list section in
llms.txt to include the codex-specific filenames so it matches the "codex split"
docs: add review_first_codex.txt and review_second_codex.txt alongside the
existing review_first.txt/review_second.txt entries and keep the stated loading
priority (local → global → embedded) intact; ensure the explanatory text
clarifies that customizing review_first.txt does not affect --codex, and users
must create the review_first_codex.txt/review_second_codex.txt variants for
codex.
- idle-timeout silent review skip: external review loop now retries on r.lastSessionTimedOut instead of treating partial/empty output as a clean exit; --idle-timeout no longer bypasses external codex review - preserve user codex model and reasoning effort: emit -c model= / -c model_reasoning_effort= only when ralphex config explicitly sets them, so ~/.codex/config.toml choices are honored when ralphex does not override (matches the docs claim) - spurious external_review_tool warning: add ExternalReviewToolSet sentinel, gate the codex-override warning on user-explicit setting so vanilla --codex runs no longer warn about the embedded default - finalize-codex routing regression test: pin that finalize calls task.Run, not review.Run, under --codex mode - remove dead code under --codex: New() no longer builds unused claude/external executors when isCodexExecutor() is true - escapeCodexSingleQuoted: add \t escape; document supported set - revert unrelated simplification.txt wording change
- main_test.go:1103: replace stale comment about validateFlags rejecting --codex+--external-review-tool combo (TestValidateFlags now accepts it at the CLI stage) - runner_test.go: setIsolatedHome also sets USERPROFILE, HOMEDRIVE, HOMEPATH so os.UserHomeDir() on Windows resolves to the temp dir (test-safety rule: never touch ~/.config/ralphex/) - section_test.go: convert to table-driven form per project convention - plan file:220: add text fence language identifier (MD040) - llms.txt: add review_first_codex.txt / review_second_codex.txt to the prompt filename list with a note about the codex split - prompts.go: warn once per agent name when codex mode discards frontmatter Model/AgentType overrides, so users do not silently lose per-agent settings under --codex
| sessionIDCh := make(chan string, 1) | ||
| stderrDone := make(chan stderrResult, 1) | ||
| go func() { | ||
| stderrDone <- e.processStderr(ctx, streams.Stderr) | ||
| stderrDone <- e.processStderr(execCtx, streams.Stderr, idleTouch, sessionIDCh, firstRun) | ||
| }() | ||
|
|
||
| // read stdout entirely as final response | ||
| stdoutContent, stdoutErr := e.readStdout(streams.Stdout) | ||
| tailCancel, tailDone := e.startRolloutTail(execCtx, sessionIDCh, idleTouch) | ||
|
|
||
| // read stdout entirely as final response; wrap with touch-on-read so reads | ||
| // keep the idle timer alive even while stderr is quiet. | ||
| stdoutReader := streams.Stdout | ||
| if e.IdleTimeout > 0 { | ||
| stdoutReader = &touchReader{r: streams.Stdout, touch: idleTouch} | ||
| } | ||
| stdoutContent, stdoutErr := e.readStdout(stdoutReader) | ||
|
|
||
| // wait for stderr processing to complete | ||
| stderrRes := <-stderrDone | ||
|
|
||
| // codex has exited; signal tailer to drain remaining file content and stop | ||
| tailCancel() | ||
| <-tailDone |
| // idleTimeoutResult builds the Result returned when the idle-timeout timer | ||
| // canceled the derived execution context (parent ctx still alive). limit and | ||
| // logDroppedIdleErrors surfaces concurrent stream/wait errors that would otherwise | ||
| // be discarded by the idle-timeout completion path. operators need this to | ||
| // distinguish "agent went silent" from "stream broke" before retrying. | ||
| func (e *CodexExecutor) logDroppedIdleErrors(stdoutErr, waitErr error) { |
| // firstLine returns the first newline-delimited segment of s, or s when it | ||
| // contains no newline. used to keep rollout event display lines compact. | ||
| func (e *CodexExecutor) firstLine(s string) string { | ||
| head, _, _ := strings.Cut(s, "\n") | ||
| return head | ||
| } |
| type rolloutPayload struct { | ||
| Type string `json:"type"` | ||
| Role string `json:"role"` | ||
| Name string `json:"name"` | ||
| Arguments string `json:"arguments"` | ||
| Content []struct { | ||
| Type string `json:"type"` | ||
| Text string `json:"text"` | ||
| } `json:"content"` | ||
| } |
| return errors.New("first review pass timed out") | ||
| } | ||
|
|
||
| if !isReviewDone(result.Signal) { | ||
| r.log.Print("warning: first review pass did not complete cleanly, continuing...") |
| // writeFile and writeStdout each take l.writeMu so callers that emit a | ||
| // timestamped log line via writeFile + writeStdout (back-to-back inside the | ||
| // same public method) do not interleave on either sink under concurrent | ||
| // producers — codex stderr processor and rollout-file tailer both call | ||
| // OutputHandler -> Print on the same Logger. | ||
| func (l *Logger) writeFile(format string, args ...any) { | ||
| l.writeMu.Lock() | ||
| defer l.writeMu.Unlock() | ||
| if l.file != nil { | ||
| fmt.Fprintf(l.file, format, args...) | ||
| } | ||
| } | ||
|
|
||
| func (l *Logger) writeStdout(format string, args ...any) { | ||
| l.writeMu.Lock() | ||
| defer l.writeMu.Unlock() | ||
| fmt.Fprintf(l.stdout, format, args...) | ||
| } |
Adds --codex CLI flag and matching `executor = codex` config option that
runs the full pipeline (task, internal review, finalize) through the codex
CLI instead of Claude Code. Motivated by the June 2026 Anthropic billing
split that moves unattended ralphex runs off the Claude Max plan; gives
OpenAI/codex-plan users a single flag to keep the full pipeline on their
existing subscription.
Highlights:
- New `--codex` and `--pass-claude-md` CLI flags plus matching `executor`
and `pass_claude_md` config keys, with mutex validation against
`--external-only`, `--codex-only`, and `--external-review-tool != none`.
- Role-named executor wiring: `Executors` struct switches from
`Claude`/`ReviewClaude`/`Codex` to `Task`/`Review`/`External`/`Custom`.
Under `--codex` the constructor wires the same shared `CodexExecutor`
instance into both `Task` and `Review` so any phase can use
`{{agent:...}}` expansions.
- `CodexExecutor` gains `MultiAgent`, `PassClaudeMd`, `IdleTimeout`,
`Sandbox`-with-mode-aware-default, additive `-c` flag overrides, an
idle-timeout machinery via `touchReader` + `time.AfterFunc`, and
`finalError` extraction so non-zero exits surface a useful diagnostic.
- Shared review prompts: `{{agent:<name>}}` expander reads
`cfg.AppConfig.Executor` and emits the executor-appropriate invocation
(Task tool for claude, `spawn_agent(agent='reviewer', task='...')`
for codex). No separate codex-only prompt files are needed.
- Mode-aware sandbox: `read-only` for external codex review (default
claude mode); `danger-full-access` for first-class `executor = codex`
(task/review/finalize need to write git metadata and commit) via the
new `(*config.Config).CodexExecutorSandbox()` accessor.
- `--pass-claude-md` adds `-c project_doc_fallback_filenames=["CLAUDE.md"]`
so codex's native AGENTS.md walk picks up project-level `./CLAUDE.md`.
Valid only under the codex executor; rejected for claude mode.
- Codex error/limit patterns extended with stderr live scanning gated on
`error:`/`fatal:`/`panic:` prefix so progress chatter cannot trip
false positives; stdout-first / stderr-second priority within each
severity class.
- Documentation: README codex-mode section, llms.txt mirror, CLAUDE.md
project-doc notes, `docs/custom-providers.md` deprecation callouts,
implementation plan at `docs/plans/completed/20260514-codex-first-class-mode.md`.
Internal-review section: `NewInternalReviewSection` returns
`SectionInternalReview` with an executor-neutral label to avoid
`phaseFromSection` misrouting under codex.
…on banner Three coordinated changes make --codex executor mode produce rich per-line output instead of just codex's reasoning-summary headers. Rollout tailer (pkg/executor/codex.go): - Codex has no stream-json equivalent for claude's structured output, so the model's assistant message text only lands in stdout at end-of-session and was never streamed. - Run() now extracts the session id from stderr's header banner via extractSessionID + a buffered sessionIDCh, then spawns tailRolloutFile to follow ~/.codex/sessions/<y>/<m>/<d>/rollout-*-<session-id>.jsonl like tail -f. formatRolloutEvent forwards assistant message text only; function_call records (exec_command, spawn_agent) and reasoning records are dropped as tool-machinery noise the narrative already covers. - tailCtx is canceled after stdout EOF so the tailer drains remaining buffered events once more and exits without blocking shutdown. Banner trim with one-time leak (pkg/executor/codex.go): - shouldDisplay no longer forwards codex's 8-line startup header (workdir/provider/approval/session-id/etc.) per task/review/finalize phase, matching claude's empty-banner UX. - On the very first Run() across the executor's lifetime (headerEmitted atomic.Bool via CompareAndSwap) it whitelists model:, sandbox:, and reasoning effort: from the header block so users see what codex resolved from ~/.codex/config.toml. Subsequent invocations get full suppression. Session id detection runs independently of display so the rollout tailer keeps working either way. No-fork-context hint (pkg/processor/prompts.go): - formatAgentExpansionCodex appends "Pass only agent and task; do not set fork_context." to the spawn_agent example. Codex was auto-pairing fork_context=true with explicit agent_type and getting rejected, then retrying without — costing ~5 wasted spawn calls per review iteration. Startup banner (cmd/ralphex/main.go): - printExecutorInfo emits sandbox: under "executor: codex" using the resolved CodexExecutorSandbox() value, plus model: / reasoning effort: when --task-model explicitly set them. Gives a clear executor summary before any codex output. Tests cover rollout tailing initial-drain + late-append, formatRolloutEvent JSONL parsing, firstRun lifecycle across multiple Run() calls, session id regex, and the updated suppress-by-default shouldDisplay contract.
…w prompts
Adds prependCodexReviewGuidance + codexReviewGuidance that prepends a
"=== Codex orchestration directives ===" block to ReviewFirstPrompt and
ReviewSecondPrompt at build time when cfg.isCodexExecutor() is true. Covers
two codex multi_agent quirks observed end-to-end on the toy project:
- spawn_agent: pass only agent and task arguments. codex rejects
fork_context=true paired with explicit agent_type, costing ~5 wasted
retry round-trips per review iteration.
- wait_agent: when timed_out=true with empty status (sub-agent died
mid-tool-call), re-spawn the missing agents ONCE and proceed with
partial results. Without this directive codex sits through four 10-min
wait windows (~40 min) before deciding to retry on its own.
Section-level injection works for users on embedded defaults AND users
with customized review prompts that hard-code agent lists inline (bypassing
the {{agent:NAME}} expander). Wired at three review-prompt build sites in
pkg/processor/runner.go: full pipeline first review, review-only mode first
review, and the review-second loop.
formatAgentExpansionCodex no longer emits a per-agent fork_context hint
since the section-level block is now the single source of truth.
Also fixes a -race test failure from commit 582f75c. The tailer goroutine
appends to a test-side []string concurrently with the main goroutine
reading len(); now guarded with sync.Mutex + snapshot helper. Production
OutputHandler writes through file descriptors only (no shared Go data
structures), so this affected only the test.
Measured: same toy project, --codex --pass-claude-md, review 1 dropped
from 41m30s to 1m22s; total wall-clock from ~49 min to ~7 min.
Code-review-plus + codex GPT-5 + Copilot identified 17 places where the
first-class --codex work silently altered behavior for users running without
--codex. Each change here scopes new behavior to the opt-in --codex path and
preserves master semantics for the default-claude pipeline.
Critical
- runReview no longer aborts the run on first-review session/idle timeout
for default-claude users (master logged a soft warning and continued).
Gated on cfg.isCodexExecutor() — only --codex treats the timeout as fatal
(load-bearing under --codex because external review is auto-skipped).
Phase label parameterized so the warning/error text is accurate if the
helper is reused.
- progress.Logger writes are now serialized through l.writeMu held across
each public method's full file + stdout pair (writeTimestamped, PrintRaw,
PrintSection, PrintAligned per-line, LogQuestion, LogDraftReview, Close).
processStderr and the new tailRolloutFile both call OutputHandler from
separate goroutines; without the mutex a concurrent producer could slip
between a method's two sinks and split related output (file gets producer
A while stdout shows B, or LogQuestion's QUESTION line gets separated
from its companion OPTIONS line). writeFile/writeStdout renamed to
*Locked to make the contract explicit.
High (backward-compat regressions)
- execCodexRunner.childEnv now scopes ANTHROPIC_API_KEY stripping to
first-class --codex via a stripAnthropicKey flag wired to MultiAgent.
External codex review in default-claude mode preserves the host env so
custom codex wrappers proxying through Anthropic (scripts/codex-as-claude.sh)
keep authenticating. CLAUDECODE is always stripped.
- codex_model = gpt-5.5 and codex_reasoning_effort = xhigh restored as
active embedded defaults; commenting out the lines is the user opt-in
for inheriting from ~/.codex/config.toml. Previous commented-out defaults
silently changed review quality/cost for new installs.
- IdleTimeout removed from newBaseCodexExecutor; buildCodexExecutor (first-
class --codex) wires it. External codex review in claude mode no longer
inherits the claude idle_timeout, preserving master semantics where
codex review sessions were not idle-killed.
Medium
- --dangerously-bypass-approvals-and-sandbox gated on e.MultiAgent so
external codex review in Docker mode (sandbox forced to danger-full-access)
does not silently skip approval prompts that used to surface.
- review_first.txt / review_second.txt restored claude-specific guardrails
("all agent invocations in a single message", "Do NOT use
run_in_background", "do NOT paste diffs into agent prompts"). Phrased
inclusively so both executors get useful guidance.
Low / cleanup
- Deleted dead firstLine method and its test (no production callers).
- Swapped misattributed godoc on logDroppedIdleErrors / idleTimeoutResult.
- Dropped unread rolloutPayload Name and Arguments fields (formatRolloutEvent
only renders assistant messages).
- runReview phase label parameterized (no more hard-coded "first review
pass" in error string).
- Renamed SectionClaudeReview to SectionInternalReview (covers both claude
and codex internal review iterations).
Docs / banner
- Startup banner under --codex now reads cfg.AppConfig.CodexModel and
CodexReasoningEffort directly (what the executor actually receives),
not TaskModel (which only feeds claude). Fix applied at both executePlan
and runPlanMode banner sites.
- docs/custom-providers.md no longer references the removed
review_first_codex.txt / review_second_codex.txt files (those were
removed during the earlier prompt-duality collapse); replaced with
description of the shared review prompts + prependCodexReviewGuidance
section-level injection.
- llms.txt + README + CLAUDE.md document the new mode-specific scoping
for codex_sandbox, codex_model, idle_timeout, and ANTHROPIC_API_KEY.
- Dropped self-contradictory "silently … printing a warning" phrasing in
README + llms.txt.
Tests pass under `go test -race ./...`; linter clean.
37f031b to
29da334
Compare
ExecutorClaude = "" was the convention the surrounding code relied on (unset config field → claude pipeline), but at a glance the empty literal looked like a placeholder. Added a comment block explaining that an unset executor field resolves to claude by design and ExecutorCodex is the opt-in first-class --codex path. Revdiff annotation on master diff (pkg/config/config.go:32) flagged this as confusing — fixing in place.
gofmt-only cleanup; one blank line left over from the firstLine method deletion was not normalized away during the rebase. No behavior change.
Adds first-class
--codexsupport as an executor switch for task, review, and finalize phases.ralphex could already run codex through compatibility wrappers, but that path still pretended codex was Claude and kept Claude-flavored prompts. This PR calls codex directly, wires codex-specific review prompts, and keeps user codex config layered through additive
-coverrides.This is for the Anthropic June 15 billing change: ralphex drives Claude Code with
-p, and that usage is split away from the Max subscription. Users with an OpenAI/Codex plan get a direct way to run the whole pipeline on that plan.In codex mode ralphex skips external review, uses codex-specific review prompts with
spawn_agent, and keeps the default Claude behavior intact.Also adds
--pass-claude-md, config keys, docs, and timeout/sandbox handling for codex execution.Summary by CodeRabbit
New Features
Documentation
Tests