Add first-class codex executor mode by umputun · Pull Request #350 · umputun/ralphex

umputun · 2026-05-17T21:52:50Z

Adds first-class --codex support as an executor switch for task, review, and finalize phases.

ralphex could already run codex through compatibility wrappers, but that path still pretended codex was Claude and kept Claude-flavored prompts. This PR calls codex directly, wires codex-specific review prompts, and keeps user codex config layered through additive -c overrides.

This is for the Anthropic June 15 billing change: ralphex drives Claude Code with -p, and that usage is split away from the Max subscription. Users with an OpenAI/Codex plan get a direct way to run the whole pipeline on that plan.

In codex mode ralphex skips external review, uses codex-specific review prompts with spawn_agent, and keeps the default Claude behavior intact.

Also adds --pass-claude-md, config keys, docs, and timeout/sandbox handling for codex execution.

Summary by CodeRabbit

New Features
- Added a first-class "codex" executor mode with CLI flags (--codex, --pass-claude-md) to run task/review/finalize via Codex and optionally pass project CLAUDE.md into Codex; startup banners now show executor and passthrough status.
Documentation
- Expanded docs and usage tables covering Codex mode, passthrough behavior, mutual-exclusion rules, and revised session/idle timeout semantics.
Tests
- Extensive test coverage added for Codex flows, prompts, timeouts, startup messaging, and agent expansion.

Plan for promoting CodexExecutor from external-review-only to a peer of ClaudeExecutor across task, review, and finalize phases. Adds --codex and --pass-claude-md flags, role-named Executors struct (Task/Review/External/Custom), codex-flavored review prompts, and additive -c flag overrides that preserve user's ~/.codex config. 13 tasks. Survived smells pre-check, plan-review pass, and an interactive revdiff round that pivoted away from the original CODEX_HOME temp-directory approach.

coderabbitai · 2026-05-17T21:53:35Z

📝 Walkthrough

Walkthrough

Adds a first-class Codex executor: CLI flags/config fields for executor selection and CLAUDE.md passthrough, codex-specific review prompts and spawn_agent expansion, Codex executor idle/session handling, runner refactor to role-based executors, extensive tests, and documentation updates.

Changes

First-class Codex Executor Mode

Layer / File(s)	Summary
CLI flags and configuration foundation `cmd/ralphex/main.go`, `pkg/config/config.go`, `pkg/config/values.go`, `pkg/config/defaults/config`, `pkg/config/*_test.go`	Adds `--codex` and `--pass-claude-md` flags; `Executor` / `PassClaudeMd` config/values support with `*Set` sentinels; new default keys and embedded prompt filename additions.
CLI override logic and codex executor validation `cmd/ralphex/main.go`, `cmd/ralphex/main_test.go`	Implements `applyCodexOverrides` enforcing post-merge mutex rules, warning/overriding `external_review_tool` to `"none"` when needed, validating prerequisites, and checking `codex` binary availability; startup banners updated to display executor/pass status.
Codex-specific review prompts `pkg/config/prompts.go`, `pkg/config/defaults/prompts/review_first_codex.txt`, `pkg/config/defaults/prompts/review_second_codex.txt`, `pkg/config/defaults_test.go`	Adds codex review prompt templates and wiring into Prompts with local→global→embedded fallback; updates installer tests to include new files.
Codex-style agent expansion and escaping `pkg/processor/prompts.go`, `pkg/processor/prompts_test.go`	Dispatches agent expansion between Claude and Codex syntaxes; implements `formatAgentExpansionCodex` and `escapeCodexSingleQuoted`; extensive tests validate expansion, escaping, and multiline-safety.
Codex executor multi-agent and idle timeout `pkg/executor/codex.go`, `pkg/executor/codex_test.go`	Extends `CodexExecutor` with `MultiAgent`, `PassClaudeMd`, and `IdleTimeout`; implements additive `-c` overrides and idle-timeout via `touchReader`; centralizes final error handling; tests cover arg generation and idle semantics.
Runner refactoring to role-based executor structure `pkg/processor/runner.go`, `pkg/processor/export_test.go`	Refactors Runner to role-named `Executors` (Task/Review/External/Custom), resolves prompt variants at init, emits one-time CLAUDE.md hint, short-circuits external review in codex mode when disabled, and updates session timeout gating and finalize routing.
Runner tests and new behavioral coverage `pkg/processor/runner_test.go`	Rewires runner tests to new Executors shape; adds tests for codex external-review skipping, session-timeout gating, idle diagnostics preservation, CLAUDE.md hint gating, PassClaudeMd propagation, and review prompt selection.
Internal review labeling and codex signal parsing `pkg/status/section.go`, `pkg/status/section_test.go`, `pkg/executor/executor_test.go`, `pkg/processor/signals_test.go`	Adds `NewInternalReviewSection()` for executor-agnostic review labels; extends detectSignal tests to cover markers in backticked blocks; adds regressions for codex-style payload parsing and phase routing.
Comprehensive documentation and planning `CLAUDE.md`, `README.md`, `docs/custom-providers.md`, `docs/plans/20260514-codex-first-class-mode.md`, `llms.txt`	Documents Codex executor mode, CLI/config precedence and mutual-exclusion rules, timeout scoping differences (legacy vs codex), CLAUDE.md passthrough behavior and one-time hint, and provides a detailed implementation plan.
Minor cleanup and test-accessor changes `pkg/git/service_test.go`, `pkg/processor/export_test.go`	Whitespace cleanup and addition of test-only Runner accessor/hint helpers for CLAUDE.md testing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A codex path emerges bright,
Agents spawn and parse the night,
Executors dance in ordered queues,
Prompts and hints light up the news,
Pipelines hum — the rabbit applauds with delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add first-class codex executor mode' directly and clearly summarizes the primary change: introducing Codex as a first-class executor alongside Claude, replacing wrapper-based approaches with direct Codex execution.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex-first-class-mode

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds first-class --codex executor mode that drives task, internal review, and finalize phases directly through the codex CLI (skipping external review since same-model self-review has weak signal). Motivated by the June 2026 Anthropic billing split that moves unattended ralphex runs off the Claude Max plan; gives OpenAI/codex-plan users a single flag to keep the full pipeline on their existing subscription. Also introduces --pass-claude-md, codex-specific review prompts, role-named executor wiring, and idle/session timeout coverage for codex.

Changes:

New --codex and --pass-claude-md CLI flags plus matching executor/pass_claude_md config keys, with mutex validation against --external-only, --codex-only, and --external-review-tool != none.
Refactor Executors from executor-named (Claude/ReviewClaude/Codex) to role-named (Task/Review/External/Custom); add codex agent-expansion via spawn_agent, additive -c overrides, idle/session-timeout support for CodexExecutor, executor-neutral internal-review section labels.
Docs/prompts: codex-mode README section, new review_first_codex.txt / review_second_codex.txt defaults, CLAUDE.md/llms.txt/custom-providers/plan doc updates.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
README.md	Documents `--codex`, `--pass-claude-md`, config keys, mutex rules, prompt-customization split, codex 0.130.0 requirement.
llms.txt	Mirrors README codex-mode summary for LLM consumers.
CLAUDE.md	Adds project-doc notes about codex executor mode.
docs/custom-providers.md	Calls out wrapper deprecation in favor of `--codex`.
docs/plans/20260514-codex-first-class-mode.md	Implementation plan/spec for the feature.
cmd/ralphex/main.go	Adds flags, codex dep check selection, post-merge `applyCodexOverrides` with mutex/auto-disable handling.
cmd/ralphex/main_test.go	Tests for new flag/config mutex and override paths.
pkg/processor/runner.go	Role-named executors, codex executor builders, codex-aware finalize routing, `runWithSessionTimeout` rework.
pkg/processor/prompts.go	Codex-shape agent expansion via `spawn_agent`, `escapeCodexSingleQuoted`.
pkg/processor/prompts_test.go	Covers codex agent expansion + escaping.
pkg/processor/export_test.go	Test exports for new internals.
pkg/processor/signals_test.go	Regression tests confirming codex-style free-form output still parses.
pkg/executor/codex.go	Adds `MultiAgent`, `PassClaudeMd`, `IdleTimeout`, additive `-c` args, idle-timeout machinery, `touchReader`, `finalError` extraction.
pkg/executor/codex_test.go	Tests idle timeout fires/resets, args splicing, finalError surfaces.
pkg/executor/executor_test.go	Minor exec test updates for new fields.
pkg/config/config.go	New `Executor` and `PassClaudeMd` config keys plus constants.
pkg/config/values.go	`*Set` sentinels for new fields.
pkg/config/config_test.go, values_test.go, defaults_test.go	Coverage for parsing/merge.
pkg/config/prompts.go	`ReviewFirstCodexPrompt` / `ReviewSecondCodexPrompt` loader plumbing.
pkg/config/defaults/config	Default config additions.
pkg/config/defaults/prompts/review_first_codex.txt, review_second_codex.txt	New codex-specific review prompts.
pkg/status/section.go	`NewInternalReviewSection` with executor-neutral label to avoid `phaseFromSection` misrouting.
pkg/status/section_test.go	Asserts label contract and absence of `codex` in label.
pkg/web/session_progress_test.go	Regression for internal-review label routing under codex.
pkg/web/server_test.go	Removes stray blank line.
pkg/git/service_test.go	Trivial trailing-newline cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	if cfg.isCodexExecutor() {
+		e.PassClaudeMd = cfg.AppConfig.PassClaudeMd
+	}


+// formatAgentExpansionCodex builds the spawn_agent block used by codex executor.
+// frontmatter Model/AgentType overrides do not apply: codex registers a single
+// reviewer agent globally; per-call behavior is driven by the inlined task body.
+// the agent body is escaped so apostrophes (don't, what's) and backslashes inside
+// it do not terminate the surrounding single-quoted task='...' literal. the agent
+// name is shared with pkg/executor.CodexReviewerAgentName so the spawn_agent call
+// here stays in sync with the codex -c agents.<name>.description registration.
+func (r *Runner) formatAgentExpansionCodex(prompt string) string {
+	return fmt.Sprintf(`spawn_agent(agent='%s', task='%s')
+
+Report findings only - no positive observations.`, executor.CodexReviewerAgentName, r.escapeCodexSingleQuoted(prompt))
+}


+// escapeCodexSingleQuoted escapes backslashes, single quotes, and newlines so
+// the result is safe to splice into a Python-style single-quoted string literal
+// (which is what spawn_agent(task='...') expects). backslash must be escaped
+// first so the subsequent escapes do not get re-escaped. newlines and carriage
+// returns are escaped because multi-line agent bodies would otherwise embed raw
+// newlines into the task='...' literal and codex's string parser treats those
+// as terminators of the single-quoted form.
+func (r *Runner) escapeCodexSingleQuoted(s string) string {
+	s = strings.ReplaceAll(s, `\`, `\\`)
+	s = strings.ReplaceAll(s, `'`, `\'`)
+	s = strings.ReplaceAll(s, "\r", `\r`)
+	s = strings.ReplaceAll(s, "\n", `\n`)
+	return s
+}


+	if o.ExternalOnly {
+		return errors.New("--external-only is incompatible with codex executor (external review is skipped in codex mode)")
+	}
+	if o.CodexOnly {
+		return errors.New("--codex-only is incompatible with codex executor (external review is skipped in codex mode)")
+	}
+	if o.externalReviewToolSet && o.ExternalReviewTool != "none" {
+		return errors.New("--external-review-tool is incompatible with codex executor (external review is skipped)")
+	}


+	e.IdleTimeout = cfg.AppConfig.IdleTimeout
+	if cfg.isCodexExecutor() {


+	// under --codex, finalize is single-agent (no multi_agent overrides) and runs through
+	// the task-phase codex instance. under claude, the review executor is fine — it shares
+	// the claude binary and may carry a different model selection for review-style work.
+	execName := r.executorName()
+	finalizeRun := r.review.Run
+	if r.isCodexExecutor() {
+		finalizeRun = r.task.Run
+	}
+	result := r.runWithLimitRetry(ctx, finalizeRun, prompt, execName)


cloudflare-workers-and-pages · 2026-05-17T21:59:55Z

Deploying ralphex with Cloudflare Pages

Latest commit:	`64d51f3`
Status:	✅ Deploy successful!
Preview URL:	https://970fdbdd.ralphex.pages.dev
Branch Preview URL:	https://codex-first-class-mode.ralphex.pages.dev

View logs

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

CLAUDE.md (1)
74-74: 💤 Low value

Consider breaking up the very long paragraph for improved readability.

The single-paragraph explanation of --codex mode (line 74) packs 8+ distinct concepts into one paragraph: executor switch semantics, external review forcing, pipeline mode preservation, Executors struct fields, agent registration, -c flag overrides, config preservation, PassClaudeMd behavior, and user-level CLAUDE.md hints. While technically accurate, breaking this into 2-3 focused paragraphs would improve scannability.

Suggested structure:

What --codex does (executor switch, external review skip, mode preservation)

How multi-agent reviews are configured (-c overrides, config layering)

PassClaudeMd behavior (project vs user-level)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 74, Split the long single paragraph describing the --codex
behavior into 2–3 focused paragraphs: first explain the executor switch
semantics (the --codex flag sets cfg.Executor = config.ExecutorCodex, forces
cfg.ExternalReviewTool = "none", preserves the pipeline Mode enum, and maps
Executors struct role fields Task/Review/External/Custom to the underlying
CodexExecutor for each phase); second describe multi-agent configuration and
agent registration (explain that codexConfigOpts.cliArgs() in
pkg/executor/codex.go adds additive -c overrides like features.multi_agent and
agents.reviewer.description on top of ~/.codex/config.toml so user
model/sandbox/MCP settings are preserved); third cover PassClaudeMd behavior and
CLAUDE.md hints (document that --pass-claude-md sets CodexExecutor.PassClaudeMd
= true to pass project CLAUDE.md via -c project_doc_fallback_filenames and that
the runner prints a one-time hint for user-level CLAUDE.md symlink without
writing to ~/.codex).
cmd/ralphex/main_test.go (1)
961-1152: 🏗️ Heavy lift

Refactor repetitive codex override tests into table-driven cases.

TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides repeat setup/assert flows heavily; converting these to table-driven cases with shared setup helpers will reduce duplication and make new matrix cases safer to add.

As per coding guidelines, "**/*_test.go: Table-driven tests with testify".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ralphex/main_test.go` around lines 961 - 1152, Replace the many
duplicated t.Run blocks in TestApplyCodexOverrides_PostMergeMutexChecks and
TestCodexFlag_ApplyCLIOverrides with table-driven loops: define a slice of test
cases (fields: name, cliArgs []string, initial cfg *config.Config, wantErr bool,
wantErrContains []string, wantWarnContains []string, wantCfgChecks funcs or
expected field values) and iterate with t.Run(name, func(t *testing.T){ setup
common vars (o := parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer),
call applyCodexOverrides or applyCLIOverrides as appropriate, then assert
expecteds (require.Error/NoError, strings.Contains on err.Error,
warnBuf.String(), and cfg fields like Executor, ExternalReviewTool,
PassClaudeMd) to eliminate copy/paste; keep helper calls parseTestOpts,
applyCodexOverrides, applyCLIOverrides, and warnBuf usage from the existing
tests to preserve behavior and warning checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/ralphex/main_test.go`:
- Around line 1101-1103: Update the stale test comment to reflect that
validateFlags now accepts these flag combinations in TestValidateFlags rather
than rejecting them; clarify that applyCLIOverrides is still called after
validateFlags and the guard exists to cover the no-warning branch when a user
explicitly set the flag (which used to be a hard error pre-applyCLIOverrides).
Replace the existing text referencing rejection by validateFlags with a line
stating that TestValidateFlags intentionally accepts the combo at CLI stage and
that applyCLIOverrides ensures the no-warning branch when the flag was
explicitly provided.

In `@cmd/ralphex/main.go`:
- Around line 1380-1382: The warning logic is inverted: it warns when
external_review_tool is the embedded default (cfg.ExternalReviewTool set but
o.externalReviewToolSet false). Change the condition so the message is emitted
only when the user explicitly set external_review_tool and it is being
overridden; i.e., replace the current check (cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet) with a check that
requires o.externalReviewToolSet to be true (for example:
o.externalReviewToolSet && cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call
using cfg.ExternalReviewTool.

In `@pkg/processor/runner_test.go`:
- Around line 4374-4379: setIsolatedHome currently only sets HOME; on Windows
os.UserHomeDir() will read USERPROFILE (or HOMEDRIVE+HOMEPATH), so update
setIsolatedHome to also set USERPROFILE to the same t.TempDir() and populate
HOMEDRIVE and HOMEPATH so UserHomeDir resolves to the temp dir on Windows; keep
the existing call to processor.ResetClaudeMdHintOnceForTest(). Ensure you use
t.Setenv for USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across
platforms.

In `@pkg/status/section_test.go`:
- Around line 9-24: Convert the current hardcoded subtests in
TestReviewSections_TypeAndLabelContract into a table-driven test: create a slice
of test cases with fields (name, constructor args, expected Type, expected
Iteration, expected Label and any NotContains checks), iterate over them with
t.Run(tc.name, func(t *testing.T) { ... }), and for each case call the
appropriate constructor (NewClaudeReviewSection or NewInternalReviewSection) and
assert expectations against section.Type, section.Iteration and section.Label
(including the NotContains check for "codex"); keep the test name
TestReviewSections_TypeAndLabelContract and reuse the symbols
NewClaudeReviewSection, NewInternalReviewSection and SectionClaudeReview so
assertions remain the same.

---

Nitpick comments:
In `@CLAUDE.md`:
- Line 74: Split the long single paragraph describing the --codex behavior into
2–3 focused paragraphs: first explain the executor switch semantics (the --codex
flag sets cfg.Executor = config.ExecutorCodex, forces cfg.ExternalReviewTool =
"none", preserves the pipeline Mode enum, and maps Executors struct role fields
Task/Review/External/Custom to the underlying CodexExecutor for each phase);
second describe multi-agent configuration and agent registration (explain that
codexConfigOpts.cliArgs() in pkg/executor/codex.go adds additive -c overrides
like features.multi_agent and agents.reviewer.description on top of
~/.codex/config.toml so user model/sandbox/MCP settings are preserved); third
cover PassClaudeMd behavior and CLAUDE.md hints (document that --pass-claude-md
sets CodexExecutor.PassClaudeMd = true to pass project CLAUDE.md via -c
project_doc_fallback_filenames and that the runner prints a one-time hint for
user-level CLAUDE.md symlink without writing to ~/.codex).

In `@cmd/ralphex/main_test.go`:
- Around line 961-1152: Replace the many duplicated t.Run blocks in
TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides
with table-driven loops: define a slice of test cases (fields: name, cliArgs
[]string, initial cfg *config.Config, wantErr bool, wantErrContains []string,
wantWarnContains []string, wantCfgChecks funcs or expected field values) and
iterate with t.Run(name, func(t *testing.T){ setup common vars (o :=
parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer), call
applyCodexOverrides or applyCLIOverrides as appropriate, then assert expecteds
(require.Error/NoError, strings.Contains on err.Error, warnBuf.String(), and cfg
fields like Executor, ExternalReviewTool, PassClaudeMd) to eliminate copy/paste;
keep helper calls parseTestOpts, applyCodexOverrides, applyCLIOverrides, and
warnBuf usage from the existing tests to preserve behavior and warning checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c0f7f1d1-b825-4663-9572-cf551d721c72

📥 Commits

Reviewing files that changed from the base of the PR and between 5d07e2b and d9f6539.

📒 Files selected for processing (30)

CLAUDE.md
README.md
cmd/ralphex/main.go
cmd/ralphex/main_test.go
docs/custom-providers.md
docs/plans/20260514-codex-first-class-mode.md
llms.txt
pkg/config/config.go
pkg/config/config_test.go
pkg/config/defaults/config
pkg/config/defaults/prompts/review_first_codex.txt
pkg/config/defaults/prompts/review_second_codex.txt
pkg/config/defaults_test.go
pkg/config/prompts.go
pkg/config/values.go
pkg/config/values_test.go
pkg/executor/codex.go
pkg/executor/codex_test.go
pkg/executor/executor_test.go
pkg/git/service_test.go
pkg/processor/export_test.go
pkg/processor/prompts.go
pkg/processor/prompts_test.go
pkg/processor/runner.go
pkg/processor/runner_test.go
pkg/processor/signals_test.go
pkg/status/section.go
pkg/status/section_test.go
pkg/web/server_test.go
pkg/web/session_progress_test.go

💤 Files with no reviewable changes (2)

pkg/web/server_test.go
pkg/git/service_test.go

coderabbitai · 2026-05-17T21:59:56Z

+		// CLI validation would reject this combo via validateFlags, but applyCLIOverrides
+		// is called after validateFlags; this guards the no-warning branch when user
+		// explicitly set the flag (which would have been a hard error pre-applyCLIOverrides).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale behavior comment in this test block.

Line 1101 says this combo is rejected by validateFlags, but the new TestValidateFlags cases intentionally accept these combinations at CLI stage. This comment now misstates actual behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/ralphex/main_test.go` around lines 1101 - 1103, Update the stale test comment to reflect that validateFlags now accepts these flag combinations in TestValidateFlags rather than rejecting them; clarify that applyCLIOverrides is still called after validateFlags and the guard exists to cover the no-warning branch when a user explicitly set the flag (which used to be a hard error pre-applyCLIOverrides). Replace the existing text referencing rejection by validateFlags with a line stating that TestValidateFlags intentionally accepts the combo at CLI stage and that applyCLIOverrides ensures the no-warning branch when the flag was explicitly provided.

coderabbitai · 2026-05-17T21:59:56Z

+	if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet {
+		fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool)
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Suppress misleading codex-mode warning on defaults.

At Line 1380, this warning fires for the embedded default external_review_tool = "codex", so codex mode can print a false “config-file overridden” warning even when the user didn’t set a conflicting value.

💡 Suggested fix

- if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet { + if cfg.ExternalReviewTool == "custom" && !o.externalReviewToolSet { fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/ralphex/main.go` around lines 1380 - 1382, The warning logic is inverted: it warns when external_review_tool is the embedded default (cfg.ExternalReviewTool set but o.externalReviewToolSet false). Change the condition so the message is emitted only when the user explicitly set external_review_tool and it is being overridden; i.e., replace the current check (cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet) with a check that requires o.externalReviewToolSet to be true (for example: o.externalReviewToolSet && cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call using cfg.ExternalReviewTool.

coderabbitai · 2026-05-17T21:59:56Z

+func setIsolatedHome(t *testing.T) string {
+	t.Helper()
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	processor.ResetClaudeMdHintOnceForTest()
+	return home


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Go's os.UserHomeDirdocumentation, which environment variables are used on Windows, and does setting onlyHOME isolate the returned home directory there?

💡 Result:

According to the official Go documentation, the os.UserHomeDir function returns the %USERPROFILE% environment variable when running on Windows [1][2]. Setting only the HOME environment variable on Windows does not isolate or influence the directory returned by os.UserHomeDir, as the function specifically looks for %USERPROFILE% on that platform [1][2]. While some older or third-party implementations of home directory detection historically combined %HOMEDRIVE% and %HOMEPATH% or checked other variables, the standard library's os.UserHomeDir implementation is explicitly documented to rely on %USERPROFILE% [1][2].

Citations:

1: https://pkg.go.dev/os

2: https://pkg.go.dev/github.com/gcc-mirror/gcc/libgo/go/os

🏁 Script executed:

#!/bin/bash # Search for os.UserHomeDir calls in processor code echo "=== Searching for UserHomeDir calls ===" rg "UserHomeDir|user\.HomeDir|homedir" pkg/processor --type go # Search for environment variable usage in home/path resolution echo "" echo "=== Searching for USERPROFILE/HOMEDRIVE/HOMEPATH usage ===" rg "USERPROFILE|HOMEDRIVE|HOMEPATH" pkg/processor --type go # Look for hint path resolution logic echo "" echo "=== Searching for CLAUDE.md or hint path logic ===" rg "CLAUDE\.md|hint.*path|hint.*home" pkg/processor --type go -i

Repository: umputun/ralphex

Length of output: 2181

🏁 Script executed:

#!/bin/bash # Check what the setIsolatedHome helper is actually used for echo "=== Usages of setIsolatedHome ===" rg "setIsolatedHome" pkg/processor/runner_test.go -A 2 -B 2 # Look at the hint tests that use this helper echo "" echo "=== Context around hint tests (lines 4382-4410) ===" sed -n '4382,4410p' pkg/processor/runner_test.go

Repository: umputun/ralphex

Length of output: 3079

🏁 Script executed:

#!/bin/bash # Get context around the UserHomeDir call echo "=== Context of os.UserHomeDir() call ===" rg "os\.UserHomeDir" pkg/processor/runner.go -B 5 -A 5 # Check the maybeEmitClaudeMdSetupHint function signature echo "" echo "=== maybeEmitClaudeMdSetupHint function ===" rg "func.*maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -A 15

Repository: umputun/ralphex

Length of output: 1133

🏁 Script executed:

#!/bin/bash # Verify this function is called during test execution echo "=== Searching for maybeEmitClaudeMdSetupHint calls ===" rg "maybeEmitClaudeMdSetupHint" pkg/processor --type go # Check if processor.New() calls it echo "" echo "=== Context of processor.New() ===" rg "func \(r \*Runner\) New\(" pkg/processor/runner.go -A 20 | head -40

Repository: umputun/ralphex

Length of output: 449

🏁 Script executed:

#!/bin/bash # Find where maybeEmitClaudeMdSetupHint is called in runner.go echo "=== Calls to maybeEmitClaudeMdSetupHint in runner.go ===" rg "maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -B 10 -A 2 # Check if it's called during New() or similar initialization echo "" echo "=== Check New() function for hint call ===" rg "func \(.*\) New\(" pkg/processor/runner.go -A 40 | head -60

Repository: umputun/ralphex

Length of output: 1375

Isolate home-directory lookups on Windows too.

Line 4377 only overrides HOME. On Windows, os.UserHomeDir() returns %USERPROFILE%, not HOME, so these tests will access the real user home directory instead of t.TempDir() on Windows, violating the guideline that tests MUST use t.TempDir() and NEVER touch ~/.config/ralphex/. Set the Windows environment variables to ensure consistent test isolation across platforms.

🔧 Minimal hardening

func setIsolatedHome(t *testing.T) string { t.Helper() home := t.TempDir() t.Setenv("HOME", home) + t.Setenv("USERPROFILE", home) + t.Setenv("HOMEDRIVE", "") + t.Setenv("HOMEPATH", "") processor.ResetClaudeMdHintOnceForTest() return home }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/processor/runner_test.go` around lines 4374 - 4379, setIsolatedHome currently only sets HOME; on Windows os.UserHomeDir() will read USERPROFILE (or HOMEDRIVE+HOMEPATH), so update setIsolatedHome to also set USERPROFILE to the same t.TempDir() and populate HOMEDRIVE and HOMEPATH so UserHomeDir resolves to the temp dir on Windows; keep the existing call to processor.ResetClaudeMdHintOnceForTest(). Ensure you use t.Setenv for USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across platforms.

coderabbitai · 2026-05-17T21:59:56Z

+func TestReviewSections_TypeAndLabelContract(t *testing.T) {
+	t.Run("claude first pass review", func(t *testing.T) {
+		section := NewClaudeReviewSection(0, ": all findings")
+		assert.Equal(t, SectionClaudeReview, section.Type)
+		assert.Equal(t, 0, section.Iteration)
+		assert.Equal(t, "claude review 0: all findings", section.Label)
+	})
+
+	t.Run("internal codex review", func(t *testing.T) {
+		section := NewInternalReviewSection(3, ": critical/major")
+		assert.Equal(t, SectionClaudeReview, section.Type)
+		assert.Equal(t, 3, section.Iteration)
+		assert.Equal(t, "review 3: critical/major", section.Label)
+		assert.NotContains(t, section.Label, "codex")
+	})
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Convert this test to a table-driven form.

Line 9 currently uses hardcoded subtests; project test conventions require table-driven structure for consistency and easier extension.

♻️ Proposed refactor

func TestReviewSections_TypeAndLabelContract(t *testing.T) { - t.Run("claude first pass review", func(t *testing.T) { - section := NewClaudeReviewSection(0, ": all findings") - assert.Equal(t, SectionClaudeReview, section.Type) - assert.Equal(t, 0, section.Iteration) - assert.Equal(t, "claude review 0: all findings", section.Label) - }) - - t.Run("internal codex review", func(t *testing.T) { - section := NewInternalReviewSection(3, ": critical/major") - assert.Equal(t, SectionClaudeReview, section.Type) - assert.Equal(t, 3, section.Iteration) - assert.Equal(t, "review 3: critical/major", section.Label) - assert.NotContains(t, section.Label, "codex") - }) + tests := []struct { + name string + section Section + wantType SectionType + wantIteration int + wantLabel string + noCodex bool + }{ + { + name: "claude first pass review", + section: NewClaudeReviewSection(0, ": all findings"), + wantType: SectionClaudeReview, + wantIteration: 0, + wantLabel: "claude review 0: all findings", + }, + { + name: "internal codex review", + section: NewInternalReviewSection(3, ": critical/major"), + wantType: SectionClaudeReview, + wantIteration: 3, + wantLabel: "review 3: critical/major", + noCodex: true, + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + assert.Equal(t, tc.wantType, tc.section.Type) + assert.Equal(t, tc.wantIteration, tc.section.Iteration) + assert.Equal(t, tc.wantLabel, tc.section.Label) + if tc.noCodex { + assert.NotContains(t, tc.section.Label, "codex") + } + }) + } }

As per coding guidelines "**/*_test.go: Table-driven tests with testify".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func TestReviewSections_TypeAndLabelContract(t *testing.T) {

t.Run("claude first pass review", func(t *testing.T) {

section := NewClaudeReviewSection(0, ": all findings")

assert.Equal(t, SectionClaudeReview, section.Type)

assert.Equal(t, 0, section.Iteration)

assert.Equal(t, "claude review 0: all findings", section.Label)

})

t.Run("internal codex review", func(t *testing.T) {

section := NewInternalReviewSection(3, ": critical/major")

assert.Equal(t, SectionClaudeReview, section.Type)

assert.Equal(t, 3, section.Iteration)

assert.Equal(t, "review 3: critical/major", section.Label)

assert.NotContains(t, section.Label, "codex")

})

}

func TestReviewSections_TypeAndLabelContract(t *testing.T) {

tests := []struct {

name string

section Section

wantType SectionType

wantIteration int

wantLabel string

noCodex bool

}{

{

name: "claude first pass review",

section: NewClaudeReviewSection(0, ": all findings"),

wantType: SectionClaudeReview,

wantIteration: 0,

wantLabel: "claude review 0: all findings",

},

{

name: "internal codex review",

section: NewInternalReviewSection(3, ": critical/major"),

wantType: SectionClaudeReview,

wantIteration: 3,

wantLabel: "review 3: critical/major",

noCodex: true,

},

}

for _, tc := range tests {

t.Run(tc.name, func(t *testing.T) {

assert.Equal(t, tc.wantType, tc.section.Type)

assert.Equal(t, tc.wantIteration, tc.section.Iteration)

assert.Equal(t, tc.wantLabel, tc.section.Label)

if tc.noCodex {

assert.NotContains(t, tc.section.Label, "codex")

}

})

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/status/section_test.go` around lines 9 - 24, Convert the current hardcoded subtests in TestReviewSections_TypeAndLabelContract into a table-driven test: create a slice of test cases with fields (name, constructor args, expected Type, expected Iteration, expected Label and any NotContains checks), iterate over them with t.Run(tc.name, func(t *testing.T) { ... }), and for each case call the appropriate constructor (NewClaudeReviewSection or NewInternalReviewSection) and assert expectations against section.Type, section.Iteration and section.Label (including the NotContains check for "codex"); keep the test name TestReviewSections_TypeAndLabelContract and reuse the symbols NewClaudeReviewSection, NewInternalReviewSection and SectionClaudeReview so assertions remain the same.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/plans/20260514-codex-first-class-mode.md`:
- Line 220: Add a language identifier to the fenced code block opened with
triple backticks so MD040 warnings stop; locate the anonymous code fence (the
``` block shown in the diff) and change the opening fence from ``` to ```text
(or another appropriate language) so the block becomes ```text ... ``` as in the
proposed fix.

In `@llms.txt`:
- Around line 164-165: Update the prompt filename list section in llms.txt to
include the codex-specific filenames so it matches the "codex split" docs: add
review_first_codex.txt and review_second_codex.txt alongside the existing
review_first.txt/review_second.txt entries and keep the stated loading priority
(local → global → embedded) intact; ensure the explanatory text clarifies that
customizing review_first.txt does not affect --codex, and users must create the
review_first_codex.txt/review_second_codex.txt variants for codex.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 79edc3e5-d145-49b1-a905-91148cc99cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 3812540 and 853382b.

📒 Files selected for processing (16)

CLAUDE.md
README.md
cmd/ralphex/main.go
docs/custom-providers.md
docs/plans/20260514-codex-first-class-mode.md
llms.txt
pkg/config/config.go
pkg/config/config_test.go
pkg/config/defaults/agents/simplification.txt
pkg/config/defaults/config
pkg/config/values.go
pkg/config/values_test.go
pkg/executor/codex.go
pkg/executor/codex_test.go
pkg/processor/runner.go
pkg/processor/runner_test.go

✅ Files skipped from review due to trivial changes (2)

pkg/config/defaults/agents/simplification.txt
README.md

🚧 Files skipped from review as they are similar to previous changes (10)

CLAUDE.md
docs/custom-providers.md
pkg/config/values_test.go
pkg/config/defaults/config
pkg/config/config_test.go
pkg/executor/codex_test.go
cmd/ralphex/main.go
pkg/executor/codex.go
pkg/processor/runner.go
pkg/processor/runner_test.go

coderabbitai · 2026-05-17T22:18:51Z

+
+**Codex `-c` flag args** (built per invocation by `codexConfigOpts.cliArgs()`):
+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

Line 220 opens a fenced block without a language, which triggers MD040 warnings.

📝 Proposed fix

-``` +```text review phases: -c features.multi_agent=true \ -c agents.reviewer.description="general code review specialist; behavior driven by the task argument" + when --pass-claude-md: -c project_doc_fallback_filenames=["CLAUDE.md"] task / finalize phases: (no overrides — single-agent, default config) + when --pass-claude-md: -c project_doc_fallback_filenames=["CLAUDE.md"]

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 220-220: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/plans/20260514-codex-first-class-mode.md at line 220, Add a language
identifier to the fenced code block opened with triple backticks so MD040
warnings stop; locate the anonymous code fence (the block shown in the diff) and change the opening fence from to text (or another appropriate language) so the block becomes text ... ``` as in the proposed fix.

</details>  

coderabbitai · 2026-05-17T22:18:51Z

+Prompt customization split: customizing `~/.config/ralphex/prompts/review_first.txt` does NOT apply under `--codex` because the codex path reads `review_first_codex.txt`. Users who want a customized codex review prompt need to create `review_first_codex.txt` (and `review_second_codex.txt`) alongside the claude variants. Same loading priority: local → global → embedded.
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the prompt filename list consistent with the codex split docs.

The new split explanation is good, but the prompt list on Line 117 should also include review_first_codex.txt and review_second_codex.txt to avoid conflicting guidance.

📝 Proposed fix (outside this hunk)

-**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt` +**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `review_first_codex.txt`, `review_second_codex.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt`

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@llms.txt` around lines 164 - 165, Update the prompt filename list section in llms.txt to include the codex-specific filenames so it matches the "codex split" docs: add review_first_codex.txt and review_second_codex.txt alongside the existing review_first.txt/review_second.txt entries and keep the stated loading priority (local → global → embedded) intact; ensure the explanatory text clarifies that customizing review_first.txt does not affect --codex, and users must create the review_first_codex.txt/review_second_codex.txt variants for codex.

- idle-timeout silent review skip: external review loop now retries on r.lastSessionTimedOut instead of treating partial/empty output as a clean exit; --idle-timeout no longer bypasses external codex review - preserve user codex model and reasoning effort: emit -c model= / -c model_reasoning_effort= only when ralphex config explicitly sets them, so ~/.codex/config.toml choices are honored when ralphex does not override (matches the docs claim) - spurious external_review_tool warning: add ExternalReviewToolSet sentinel, gate the codex-override warning on user-explicit setting so vanilla --codex runs no longer warn about the embedded default - finalize-codex routing regression test: pin that finalize calls task.Run, not review.Run, under --codex mode - remove dead code under --codex: New() no longer builds unused claude/external executors when isCodexExecutor() is true - escapeCodexSingleQuoted: add \t escape; document supported set - revert unrelated simplification.txt wording change

- main_test.go:1103: replace stale comment about validateFlags rejecting --codex+--external-review-tool combo (TestValidateFlags now accepts it at the CLI stage) - runner_test.go: setIsolatedHome also sets USERPROFILE, HOMEDRIVE, HOMEPATH so os.UserHomeDir() on Windows resolves to the temp dir (test-safety rule: never touch ~/.config/ralphex/) - section_test.go: convert to table-driven form per project convention - plan file:220: add text fence language identifier (MD040) - llms.txt: add review_first_codex.txt / review_second_codex.txt to the prompt filename list with a note about the codex split - prompts.go: warn once per agent name when codex mode discards frontmatter Model/AgentType overrides, so users do not silently lose per-agent settings under --codex

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 5 comments.

+	sessionIDCh := make(chan string, 1)
 	stderrDone := make(chan stderrResult, 1)
 	go func() {
-		stderrDone <- e.processStderr(ctx, streams.Stderr)
+		stderrDone <- e.processStderr(execCtx, streams.Stderr, idleTouch, sessionIDCh, firstRun)
 	}()

-	// read stdout entirely as final response
-	stdoutContent, stdoutErr := e.readStdout(streams.Stdout)
+	tailCancel, tailDone := e.startRolloutTail(execCtx, sessionIDCh, idleTouch)
+
+	// read stdout entirely as final response; wrap with touch-on-read so reads
+	// keep the idle timer alive even while stderr is quiet.
+	stdoutReader := streams.Stdout
+	if e.IdleTimeout > 0 {
+		stdoutReader = &touchReader{r: streams.Stdout, touch: idleTouch}
+	}
+	stdoutContent, stdoutErr := e.readStdout(stdoutReader)

 	// wait for stderr processing to complete
 	stderrRes := <-stderrDone

+	// codex has exited; signal tailer to drain remaining file content and stop
+	tailCancel()
+	<-tailDone


+// idleTimeoutResult builds the Result returned when the idle-timeout timer
+// canceled the derived execution context (parent ctx still alive). limit and
+// logDroppedIdleErrors surfaces concurrent stream/wait errors that would otherwise
+// be discarded by the idle-timeout completion path. operators need this to
+// distinguish "agent went silent" from "stream broke" before retrying.
+func (e *CodexExecutor) logDroppedIdleErrors(stdoutErr, waitErr error) {


+// firstLine returns the first newline-delimited segment of s, or s when it
+// contains no newline. used to keep rollout event display lines compact.
+func (e *CodexExecutor) firstLine(s string) string {
+	head, _, _ := strings.Cut(s, "\n")
+	return head
+}


+type rolloutPayload struct {
+	Type      string `json:"type"`
+	Role      string `json:"role"`
+	Name      string `json:"name"`
+	Arguments string `json:"arguments"`
+	Content   []struct {
+		Type string `json:"type"`
+		Text string `json:"text"`
+	} `json:"content"`
+}


+		return errors.New("first review pass timed out")
+	}
+
 	if !isReviewDone(result.Signal) {
 		r.log.Print("warning: first review pass did not complete cleanly, continuing...")


Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

+// writeFile and writeStdout each take l.writeMu so callers that emit a
+// timestamped log line via writeFile + writeStdout (back-to-back inside the
+// same public method) do not interleave on either sink under concurrent
+// producers — codex stderr processor and rollout-file tailer both call
+// OutputHandler -> Print on the same Logger.
 func (l *Logger) writeFile(format string, args ...any) {
+	l.writeMu.Lock()
+	defer l.writeMu.Unlock()
 	if l.file != nil {
 		fmt.Fprintf(l.file, format, args...)
 	}
 }

 func (l *Logger) writeStdout(format string, args ...any) {
+	l.writeMu.Lock()
+	defer l.writeMu.Unlock()
 	fmt.Fprintf(l.stdout, format, args...)
 }


Adds --codex CLI flag and matching `executor = codex` config option that runs the full pipeline (task, internal review, finalize) through the codex CLI instead of Claude Code. Motivated by the June 2026 Anthropic billing split that moves unattended ralphex runs off the Claude Max plan; gives OpenAI/codex-plan users a single flag to keep the full pipeline on their existing subscription. Highlights: - New `--codex` and `--pass-claude-md` CLI flags plus matching `executor` and `pass_claude_md` config keys, with mutex validation against `--external-only`, `--codex-only`, and `--external-review-tool != none`. - Role-named executor wiring: `Executors` struct switches from `Claude`/`ReviewClaude`/`Codex` to `Task`/`Review`/`External`/`Custom`. Under `--codex` the constructor wires the same shared `CodexExecutor` instance into both `Task` and `Review` so any phase can use `{{agent:...}}` expansions. - `CodexExecutor` gains `MultiAgent`, `PassClaudeMd`, `IdleTimeout`, `Sandbox`-with-mode-aware-default, additive `-c` flag overrides, an idle-timeout machinery via `touchReader` + `time.AfterFunc`, and `finalError` extraction so non-zero exits surface a useful diagnostic. - Shared review prompts: `{{agent:<name>}}` expander reads `cfg.AppConfig.Executor` and emits the executor-appropriate invocation (Task tool for claude, `spawn_agent(agent='reviewer', task='...')` for codex). No separate codex-only prompt files are needed. - Mode-aware sandbox: `read-only` for external codex review (default claude mode); `danger-full-access` for first-class `executor = codex` (task/review/finalize need to write git metadata and commit) via the new `(*config.Config).CodexExecutorSandbox()` accessor. - `--pass-claude-md` adds `-c project_doc_fallback_filenames=["CLAUDE.md"]` so codex's native AGENTS.md walk picks up project-level `./CLAUDE.md`. Valid only under the codex executor; rejected for claude mode. - Codex error/limit patterns extended with stderr live scanning gated on `error:`/`fatal:`/`panic:` prefix so progress chatter cannot trip false positives; stdout-first / stderr-second priority within each severity class. - Documentation: README codex-mode section, llms.txt mirror, CLAUDE.md project-doc notes, `docs/custom-providers.md` deprecation callouts, implementation plan at `docs/plans/completed/20260514-codex-first-class-mode.md`. Internal-review section: `NewInternalReviewSection` returns `SectionInternalReview` with an executor-neutral label to avoid `phaseFromSection` misrouting under codex.

…on banner Three coordinated changes make --codex executor mode produce rich per-line output instead of just codex's reasoning-summary headers. Rollout tailer (pkg/executor/codex.go): - Codex has no stream-json equivalent for claude's structured output, so the model's assistant message text only lands in stdout at end-of-session and was never streamed. - Run() now extracts the session id from stderr's header banner via extractSessionID + a buffered sessionIDCh, then spawns tailRolloutFile to follow ~/.codex/sessions/<y>/<m>/<d>/rollout-*-<session-id>.jsonl like tail -f. formatRolloutEvent forwards assistant message text only; function_call records (exec_command, spawn_agent) and reasoning records are dropped as tool-machinery noise the narrative already covers. - tailCtx is canceled after stdout EOF so the tailer drains remaining buffered events once more and exits without blocking shutdown. Banner trim with one-time leak (pkg/executor/codex.go): - shouldDisplay no longer forwards codex's 8-line startup header (workdir/provider/approval/session-id/etc.) per task/review/finalize phase, matching claude's empty-banner UX. - On the very first Run() across the executor's lifetime (headerEmitted atomic.Bool via CompareAndSwap) it whitelists model:, sandbox:, and reasoning effort: from the header block so users see what codex resolved from ~/.codex/config.toml. Subsequent invocations get full suppression. Session id detection runs independently of display so the rollout tailer keeps working either way. No-fork-context hint (pkg/processor/prompts.go): - formatAgentExpansionCodex appends "Pass only agent and task; do not set fork_context." to the spawn_agent example. Codex was auto-pairing fork_context=true with explicit agent_type and getting rejected, then retrying without — costing ~5 wasted spawn calls per review iteration. Startup banner (cmd/ralphex/main.go): - printExecutorInfo emits sandbox: under "executor: codex" using the resolved CodexExecutorSandbox() value, plus model: / reasoning effort: when --task-model explicitly set them. Gives a clear executor summary before any codex output. Tests cover rollout tailing initial-drain + late-append, formatRolloutEvent JSONL parsing, firstRun lifecycle across multiple Run() calls, session id regex, and the updated suppress-by-default shouldDisplay contract.

…w prompts Adds prependCodexReviewGuidance + codexReviewGuidance that prepends a "=== Codex orchestration directives ===" block to ReviewFirstPrompt and ReviewSecondPrompt at build time when cfg.isCodexExecutor() is true. Covers two codex multi_agent quirks observed end-to-end on the toy project: - spawn_agent: pass only agent and task arguments. codex rejects fork_context=true paired with explicit agent_type, costing ~5 wasted retry round-trips per review iteration. - wait_agent: when timed_out=true with empty status (sub-agent died mid-tool-call), re-spawn the missing agents ONCE and proceed with partial results. Without this directive codex sits through four 10-min wait windows (~40 min) before deciding to retry on its own. Section-level injection works for users on embedded defaults AND users with customized review prompts that hard-code agent lists inline (bypassing the {{agent:NAME}} expander). Wired at three review-prompt build sites in pkg/processor/runner.go: full pipeline first review, review-only mode first review, and the review-second loop. formatAgentExpansionCodex no longer emits a per-agent fork_context hint since the section-level block is now the single source of truth. Also fixes a -race test failure from commit 582f75c. The tailer goroutine appends to a test-side []string concurrently with the main goroutine reading len(); now guarded with sync.Mutex + snapshot helper. Production OutputHandler writes through file descriptors only (no shared Go data structures), so this affected only the test. Measured: same toy project, --codex --pass-claude-md, review 1 dropped from 41m30s to 1m22s; total wall-clock from ~49 min to ~7 min.

Code-review-plus + codex GPT-5 + Copilot identified 17 places where the first-class --codex work silently altered behavior for users running without --codex. Each change here scopes new behavior to the opt-in --codex path and preserves master semantics for the default-claude pipeline. Critical - runReview no longer aborts the run on first-review session/idle timeout for default-claude users (master logged a soft warning and continued). Gated on cfg.isCodexExecutor() — only --codex treats the timeout as fatal (load-bearing under --codex because external review is auto-skipped). Phase label parameterized so the warning/error text is accurate if the helper is reused. - progress.Logger writes are now serialized through l.writeMu held across each public method's full file + stdout pair (writeTimestamped, PrintRaw, PrintSection, PrintAligned per-line, LogQuestion, LogDraftReview, Close). processStderr and the new tailRolloutFile both call OutputHandler from separate goroutines; without the mutex a concurrent producer could slip between a method's two sinks and split related output (file gets producer A while stdout shows B, or LogQuestion's QUESTION line gets separated from its companion OPTIONS line). writeFile/writeStdout renamed to *Locked to make the contract explicit. High (backward-compat regressions) - execCodexRunner.childEnv now scopes ANTHROPIC_API_KEY stripping to first-class --codex via a stripAnthropicKey flag wired to MultiAgent. External codex review in default-claude mode preserves the host env so custom codex wrappers proxying through Anthropic (scripts/codex-as-claude.sh) keep authenticating. CLAUDECODE is always stripped. - codex_model = gpt-5.5 and codex_reasoning_effort = xhigh restored as active embedded defaults; commenting out the lines is the user opt-in for inheriting from ~/.codex/config.toml. Previous commented-out defaults silently changed review quality/cost for new installs. - IdleTimeout removed from newBaseCodexExecutor; buildCodexExecutor (first- class --codex) wires it. External codex review in claude mode no longer inherits the claude idle_timeout, preserving master semantics where codex review sessions were not idle-killed. Medium - --dangerously-bypass-approvals-and-sandbox gated on e.MultiAgent so external codex review in Docker mode (sandbox forced to danger-full-access) does not silently skip approval prompts that used to surface. - review_first.txt / review_second.txt restored claude-specific guardrails ("all agent invocations in a single message", "Do NOT use run_in_background", "do NOT paste diffs into agent prompts"). Phrased inclusively so both executors get useful guidance. Low / cleanup - Deleted dead firstLine method and its test (no production callers). - Swapped misattributed godoc on logDroppedIdleErrors / idleTimeoutResult. - Dropped unread rolloutPayload Name and Arguments fields (formatRolloutEvent only renders assistant messages). - runReview phase label parameterized (no more hard-coded "first review pass" in error string). - Renamed SectionClaudeReview to SectionInternalReview (covers both claude and codex internal review iterations). Docs / banner - Startup banner under --codex now reads cfg.AppConfig.CodexModel and CodexReasoningEffort directly (what the executor actually receives), not TaskModel (which only feeds claude). Fix applied at both executePlan and runPlanMode banner sites. - docs/custom-providers.md no longer references the removed review_first_codex.txt / review_second_codex.txt files (those were removed during the earlier prompt-duality collapse); replaced with description of the shared review prompts + prependCodexReviewGuidance section-level injection. - llms.txt + README + CLAUDE.md document the new mode-specific scoping for codex_sandbox, codex_model, idle_timeout, and ANTHROPIC_API_KEY. - Dropped self-contradictory "silently … printing a warning" phrasing in README + llms.txt. Tests pass under `go test -race ./...`; linter clean.

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

ExecutorClaude = "" was the convention the surrounding code relied on (unset config field → claude pipeline), but at a glance the empty literal looked like a placeholder. Added a comment block explaining that an unset executor field resolves to claude by design and ExecutorCodex is the opt-in first-class --codex path. Revdiff annotation on master diff (pkg/config/config.go:32) flagged this as confusing — fixing in place.

gofmt-only cleanup; one blank line left over from the firstLine method deletion was not normalized away during the rebase. No behavior change.

Copilot AI review requested due to automatic review settings May 17, 2026 21:52

Copilot started reviewing on behalf of umputun May 17, 2026 21:52 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

umputun requested a review from Copilot May 18, 2026 18:49

Copilot started reviewing on behalf of umputun May 18, 2026 18:50 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

umputun requested a review from Copilot May 18, 2026 22:00

Copilot started reviewing on behalf of umputun May 18, 2026 22:01 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

umputun added 4 commits May 18, 2026 17:22

umputun force-pushed the codex-first-class-mode branch from 37f031b to 29da334 Compare May 18, 2026 22:25

umputun requested a review from Copilot May 18, 2026 22:25

Copilot started reviewing on behalf of umputun May 18, 2026 22:26 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

umputun added 2 commits May 18, 2026 17:54

chore(codex): drop trailing blank line in codex_test.go

64d51f3

gofmt-only cleanup; one blank line left over from the firstLine method deletion was not normalized away during the rebase. No behavior change.

		e.IdleTimeout = cfg.AppConfig.IdleTimeout
		if cfg.isCodexExecutor() {

-func TestReviewSections_TypeAndLabelContract(t *testing.T) {
-	t.Run("claude first pass review", func(t *testing.T) {
-		section := NewClaudeReviewSection(0, ": all findings")
-		assert.Equal(t, SectionClaudeReview, section.Type)
-		assert.Equal(t, 0, section.Iteration)
-		assert.Equal(t, "claude review 0: all findings", section.Label)
-	})
-	t.Run("internal codex review", func(t *testing.T) {
-		section := NewInternalReviewSection(3, ": critical/major")
-		assert.Equal(t, SectionClaudeReview, section.Type)
-		assert.Equal(t, 3, section.Iteration)
-		assert.Equal(t, "review 3: critical/major", section.Label)
-		assert.NotContains(t, section.Label, "codex")
-	})
-}
+func TestReviewSections_TypeAndLabelContract(t *testing.T) {
+	tests := []struct {
+		name          string
+		section       Section
+		wantType      SectionType
+		wantIteration int
+		wantLabel     string
+		noCodex       bool
+	}{
+		{
+			name:          "claude first pass review",
+			section:       NewClaudeReviewSection(0, ": all findings"),
+			wantType:      SectionClaudeReview,
+			wantIteration: 0,
+			wantLabel:     "claude review 0: all findings",
+		},
+		{
+			name:          "internal codex review",
+			section:       NewInternalReviewSection(3, ": critical/major"),
+			wantType:      SectionClaudeReview,
+			wantIteration: 3,
+			wantLabel:     "review 3: critical/major",
+			noCodex:       true,
+		},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			assert.Equal(t, tc.wantType, tc.section.Type)
+			assert.Equal(t, tc.wantIteration, tc.section.Iteration)
+			assert.Equal(t, tc.wantLabel, tc.section.Label)
+			if tc.noCodex {
+				assert.NotContains(t, tc.section.Label, "codex")
+			}
+		})
+	}
+}


		Codex `-c` flag args (built per invocation by `codexConfigOpts.cliArgs()`):

		```

		Prompt customization split: customizing `~/.config/ralphex/prompts/review_first.txt` does NOT apply under `--codex` because the codex path reads `review_first_codex.txt`. Users who want a customized codex review prompt need to create `review_first_codex.txt` (and `review_second_codex.txt`) alongside the claude variants. Same loading priority: local → global → embedded.

Uh oh!

Conversation

umputun commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

cloudflare-workers-and-pages Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying ralphex with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

umputun commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 17, 2026 •

edited

Loading