Skip to content

Add first-class codex executor mode#350

Open
umputun wants to merge 7 commits into
masterfrom
codex-first-class-mode
Open

Add first-class codex executor mode#350
umputun wants to merge 7 commits into
masterfrom
codex-first-class-mode

Conversation

@umputun
Copy link
Copy Markdown
Owner

@umputun umputun commented May 17, 2026

Adds first-class --codex support as an executor switch for task, review, and finalize phases.

ralphex could already run codex through compatibility wrappers, but that path still pretended codex was Claude and kept Claude-flavored prompts. This PR calls codex directly, wires codex-specific review prompts, and keeps user codex config layered through additive -c overrides.

This is for the Anthropic June 15 billing change: ralphex drives Claude Code with -p, and that usage is split away from the Max subscription. Users with an OpenAI/Codex plan get a direct way to run the whole pipeline on that plan.

In codex mode ralphex skips external review, uses codex-specific review prompts with spawn_agent, and keeps the default Claude behavior intact.

Also adds --pass-claude-md, config keys, docs, and timeout/sandbox handling for codex execution.

Summary by CodeRabbit

  • New Features

    • Added a first-class "codex" executor mode with CLI flags (--codex, --pass-claude-md) to run task/review/finalize via Codex and optionally pass project CLAUDE.md into Codex; startup banners now show executor and passthrough status.
  • Documentation

    • Expanded docs and usage tables covering Codex mode, passthrough behavior, mutual-exclusion rules, and revised session/idle timeout semantics.
  • Tests

    • Extensive test coverage added for Codex flows, prompts, timeouts, startup messaging, and agent expansion.

Review Change Stack

Plan for promoting CodexExecutor from external-review-only to a
peer of ClaudeExecutor across task, review, and finalize phases.
Adds --codex and --pass-claude-md flags, role-named Executors
struct (Task/Review/External/Custom), codex-flavored review prompts,
and additive -c flag overrides that preserve user's ~/.codex config.

13 tasks. Survived smells pre-check, plan-review pass, and an
interactive revdiff round that pivoted away from the original
CODEX_HOME temp-directory approach.
Copilot AI review requested due to automatic review settings May 17, 2026 21:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

📝 Walkthrough

Walkthrough

Adds a first-class Codex executor: CLI flags/config fields for executor selection and CLAUDE.md passthrough, codex-specific review prompts and spawn_agent expansion, Codex executor idle/session handling, runner refactor to role-based executors, extensive tests, and documentation updates.

Changes

First-class Codex Executor Mode

Layer / File(s) Summary
CLI flags and configuration foundation
cmd/ralphex/main.go, pkg/config/config.go, pkg/config/values.go, pkg/config/defaults/config, pkg/config/*_test.go
Adds --codex and --pass-claude-md flags; Executor / PassClaudeMd config/values support with *Set sentinels; new default keys and embedded prompt filename additions.
CLI override logic and codex executor validation
cmd/ralphex/main.go, cmd/ralphex/main_test.go
Implements applyCodexOverrides enforcing post-merge mutex rules, warning/overriding external_review_tool to "none" when needed, validating prerequisites, and checking codex binary availability; startup banners updated to display executor/pass status.
Codex-specific review prompts
pkg/config/prompts.go, pkg/config/defaults/prompts/review_first_codex.txt, pkg/config/defaults/prompts/review_second_codex.txt, pkg/config/defaults_test.go
Adds codex review prompt templates and wiring into Prompts with local→global→embedded fallback; updates installer tests to include new files.
Codex-style agent expansion and escaping
pkg/processor/prompts.go, pkg/processor/prompts_test.go
Dispatches agent expansion between Claude and Codex syntaxes; implements formatAgentExpansionCodex and escapeCodexSingleQuoted; extensive tests validate expansion, escaping, and multiline-safety.
Codex executor multi-agent and idle timeout
pkg/executor/codex.go, pkg/executor/codex_test.go
Extends CodexExecutor with MultiAgent, PassClaudeMd, and IdleTimeout; implements additive -c overrides and idle-timeout via touchReader; centralizes final error handling; tests cover arg generation and idle semantics.
Runner refactoring to role-based executor structure
pkg/processor/runner.go, pkg/processor/export_test.go
Refactors Runner to role-named Executors (Task/Review/External/Custom), resolves prompt variants at init, emits one-time CLAUDE.md hint, short-circuits external review in codex mode when disabled, and updates session timeout gating and finalize routing.
Runner tests and new behavioral coverage
pkg/processor/runner_test.go
Rewires runner tests to new Executors shape; adds tests for codex external-review skipping, session-timeout gating, idle diagnostics preservation, CLAUDE.md hint gating, PassClaudeMd propagation, and review prompt selection.
Internal review labeling and codex signal parsing
pkg/status/section.go, pkg/status/section_test.go, pkg/executor/executor_test.go, pkg/processor/signals_test.go
Adds NewInternalReviewSection() for executor-agnostic review labels; extends detectSignal tests to cover markers in backticked blocks; adds regressions for codex-style payload parsing and phase routing.
Comprehensive documentation and planning
CLAUDE.md, README.md, docs/custom-providers.md, docs/plans/20260514-codex-first-class-mode.md, llms.txt
Documents Codex executor mode, CLI/config precedence and mutual-exclusion rules, timeout scoping differences (legacy vs codex), CLAUDE.md passthrough behavior and one-time hint, and provides a detailed implementation plan.
Minor cleanup and test-accessor changes
pkg/git/service_test.go, pkg/processor/export_test.go
Whitespace cleanup and addition of test-only Runner accessor/hint helpers for CLAUDE.md testing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A codex path emerges bright,
Agents spawn and parse the night,
Executors dance in ordered queues,
Prompts and hints light up the news,
Pipelines hum — the rabbit applauds with delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add first-class codex executor mode' directly and clearly summarizes the primary change: introducing Codex as a first-class executor alongside Claude, replacing wrapper-based approaches with direct Codex execution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex-first-class-mode

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class --codex executor mode that drives task, internal review, and finalize phases directly through the codex CLI (skipping external review since same-model self-review has weak signal). Motivated by the June 2026 Anthropic billing split that moves unattended ralphex runs off the Claude Max plan; gives OpenAI/codex-plan users a single flag to keep the full pipeline on their existing subscription. Also introduces --pass-claude-md, codex-specific review prompts, role-named executor wiring, and idle/session timeout coverage for codex.

Changes:

  • New --codex and --pass-claude-md CLI flags plus matching executor/pass_claude_md config keys, with mutex validation against --external-only, --codex-only, and --external-review-tool != none.
  • Refactor Executors from executor-named (Claude/ReviewClaude/Codex) to role-named (Task/Review/External/Custom); add codex agent-expansion via spawn_agent, additive -c overrides, idle/session-timeout support for CodexExecutor, executor-neutral internal-review section labels.
  • Docs/prompts: codex-mode README section, new review_first_codex.txt / review_second_codex.txt defaults, CLAUDE.md/llms.txt/custom-providers/plan doc updates.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
README.md Documents --codex, --pass-claude-md, config keys, mutex rules, prompt-customization split, codex 0.130.0 requirement.
llms.txt Mirrors README codex-mode summary for LLM consumers.
CLAUDE.md Adds project-doc notes about codex executor mode.
docs/custom-providers.md Calls out wrapper deprecation in favor of --codex.
docs/plans/20260514-codex-first-class-mode.md Implementation plan/spec for the feature.
cmd/ralphex/main.go Adds flags, codex dep check selection, post-merge applyCodexOverrides with mutex/auto-disable handling.
cmd/ralphex/main_test.go Tests for new flag/config mutex and override paths.
pkg/processor/runner.go Role-named executors, codex executor builders, codex-aware finalize routing, runWithSessionTimeout rework.
pkg/processor/prompts.go Codex-shape agent expansion via spawn_agent, escapeCodexSingleQuoted.
pkg/processor/prompts_test.go Covers codex agent expansion + escaping.
pkg/processor/export_test.go Test exports for new internals.
pkg/processor/signals_test.go Regression tests confirming codex-style free-form output still parses.
pkg/executor/codex.go Adds MultiAgent, PassClaudeMd, IdleTimeout, additive -c args, idle-timeout machinery, touchReader, finalError extraction.
pkg/executor/codex_test.go Tests idle timeout fires/resets, args splicing, finalError surfaces.
pkg/executor/executor_test.go Minor exec test updates for new fields.
pkg/config/config.go New Executor and PassClaudeMd config keys plus constants.
pkg/config/values.go *Set sentinels for new fields.
pkg/config/config_test.go, values_test.go, defaults_test.go Coverage for parsing/merge.
pkg/config/prompts.go ReviewFirstCodexPrompt / ReviewSecondCodexPrompt loader plumbing.
pkg/config/defaults/config Default config additions.
pkg/config/defaults/prompts/review_first_codex.txt, review_second_codex.txt New codex-specific review prompts.
pkg/status/section.go NewInternalReviewSection with executor-neutral label to avoid phaseFromSection misrouting.
pkg/status/section_test.go Asserts label contract and absence of codex in label.
pkg/web/session_progress_test.go Regression for internal-review label routing under codex.
pkg/web/server_test.go Removes stray blank line.
pkg/git/service_test.go Trivial trailing-newline cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/processor/runner.go Outdated
Comment on lines +248 to +250
if cfg.isCodexExecutor() {
e.PassClaudeMd = cfg.AppConfig.PassClaudeMd
}
Comment thread pkg/processor/prompts.go
Comment on lines +184 to +195
// formatAgentExpansionCodex builds the spawn_agent block used by codex executor.
// frontmatter Model/AgentType overrides do not apply: codex registers a single
// reviewer agent globally; per-call behavior is driven by the inlined task body.
// the agent body is escaped so apostrophes (don't, what's) and backslashes inside
// it do not terminate the surrounding single-quoted task='...' literal. the agent
// name is shared with pkg/executor.CodexReviewerAgentName so the spawn_agent call
// here stays in sync with the codex -c agents.<name>.description registration.
func (r *Runner) formatAgentExpansionCodex(prompt string) string {
return fmt.Sprintf(`spawn_agent(agent='%s', task='%s')

Report findings only - no positive observations.`, executor.CodexReviewerAgentName, r.escapeCodexSingleQuoted(prompt))
}
Comment thread pkg/processor/prompts.go Outdated
Comment on lines +197 to +210
// escapeCodexSingleQuoted escapes backslashes, single quotes, and newlines so
// the result is safe to splice into a Python-style single-quoted string literal
// (which is what spawn_agent(task='...') expects). backslash must be escaped
// first so the subsequent escapes do not get re-escaped. newlines and carriage
// returns are escaped because multi-line agent bodies would otherwise embed raw
// newlines into the task='...' literal and codex's string parser treats those
// as terminators of the single-quoted form.
func (r *Runner) escapeCodexSingleQuoted(s string) string {
s = strings.ReplaceAll(s, `\`, `\\`)
s = strings.ReplaceAll(s, `'`, `\'`)
s = strings.ReplaceAll(s, "\r", `\r`)
s = strings.ReplaceAll(s, "\n", `\n`)
return s
}
Comment thread cmd/ralphex/main.go
Comment on lines +1371 to +1379
if o.ExternalOnly {
return errors.New("--external-only is incompatible with codex executor (external review is skipped in codex mode)")
}
if o.CodexOnly {
return errors.New("--codex-only is incompatible with codex executor (external review is skipped in codex mode)")
}
if o.externalReviewToolSet && o.ExternalReviewTool != "none" {
return errors.New("--external-review-tool is incompatible with codex executor (external review is skipped)")
}
Comment thread pkg/processor/runner.go Outdated
Comment on lines +247 to +248
e.IdleTimeout = cfg.AppConfig.IdleTimeout
if cfg.isCodexExecutor() {
Comment thread pkg/processor/runner.go Outdated
Comment on lines +1502 to +1510
// under --codex, finalize is single-agent (no multi_agent overrides) and runs through
// the task-phase codex instance. under claude, the review executor is fine — it shares
// the claude binary and may carry a different model selection for review-style work.
execName := r.executorName()
finalizeRun := r.review.Run
if r.isCodexExecutor() {
finalizeRun = r.task.Run
}
result := r.runWithLimitRetry(ctx, finalizeRun, prompt, execName)
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 17, 2026

Deploying ralphex with  Cloudflare Pages  Cloudflare Pages

Latest commit: 64d51f3
Status: ✅  Deploy successful!
Preview URL: https://970fdbdd.ralphex.pages.dev
Branch Preview URL: https://codex-first-class-mode.ralphex.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
CLAUDE.md (1)

74-74: 💤 Low value

Consider breaking up the very long paragraph for improved readability.

The single-paragraph explanation of --codex mode (line 74) packs 8+ distinct concepts into one paragraph: executor switch semantics, external review forcing, pipeline mode preservation, Executors struct fields, agent registration, -c flag overrides, config preservation, PassClaudeMd behavior, and user-level CLAUDE.md hints. While technically accurate, breaking this into 2-3 focused paragraphs would improve scannability.

Suggested structure:

  1. What --codex does (executor switch, external review skip, mode preservation)
  2. How multi-agent reviews are configured (-c overrides, config layering)
  3. PassClaudeMd behavior (project vs user-level)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 74, Split the long single paragraph describing the --codex
behavior into 2–3 focused paragraphs: first explain the executor switch
semantics (the --codex flag sets cfg.Executor = config.ExecutorCodex, forces
cfg.ExternalReviewTool = "none", preserves the pipeline Mode enum, and maps
Executors struct role fields Task/Review/External/Custom to the underlying
CodexExecutor for each phase); second describe multi-agent configuration and
agent registration (explain that codexConfigOpts.cliArgs() in
pkg/executor/codex.go adds additive -c overrides like features.multi_agent and
agents.reviewer.description on top of ~/.codex/config.toml so user
model/sandbox/MCP settings are preserved); third cover PassClaudeMd behavior and
CLAUDE.md hints (document that --pass-claude-md sets CodexExecutor.PassClaudeMd
= true to pass project CLAUDE.md via -c project_doc_fallback_filenames and that
the runner prints a one-time hint for user-level CLAUDE.md symlink without
writing to ~/.codex).
cmd/ralphex/main_test.go (1)

961-1152: 🏗️ Heavy lift

Refactor repetitive codex override tests into table-driven cases.

TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides repeat setup/assert flows heavily; converting these to table-driven cases with shared setup helpers will reduce duplication and make new matrix cases safer to add.

As per coding guidelines, "**/*_test.go: Table-driven tests with testify".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ralphex/main_test.go` around lines 961 - 1152, Replace the many
duplicated t.Run blocks in TestApplyCodexOverrides_PostMergeMutexChecks and
TestCodexFlag_ApplyCLIOverrides with table-driven loops: define a slice of test
cases (fields: name, cliArgs []string, initial cfg *config.Config, wantErr bool,
wantErrContains []string, wantWarnContains []string, wantCfgChecks funcs or
expected field values) and iterate with t.Run(name, func(t *testing.T){ setup
common vars (o := parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer),
call applyCodexOverrides or applyCLIOverrides as appropriate, then assert
expecteds (require.Error/NoError, strings.Contains on err.Error,
warnBuf.String(), and cfg fields like Executor, ExternalReviewTool,
PassClaudeMd) to eliminate copy/paste; keep helper calls parseTestOpts,
applyCodexOverrides, applyCLIOverrides, and warnBuf usage from the existing
tests to preserve behavior and warning checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/ralphex/main_test.go`:
- Around line 1101-1103: Update the stale test comment to reflect that
validateFlags now accepts these flag combinations in TestValidateFlags rather
than rejecting them; clarify that applyCLIOverrides is still called after
validateFlags and the guard exists to cover the no-warning branch when a user
explicitly set the flag (which used to be a hard error pre-applyCLIOverrides).
Replace the existing text referencing rejection by validateFlags with a line
stating that TestValidateFlags intentionally accepts the combo at CLI stage and
that applyCLIOverrides ensures the no-warning branch when the flag was
explicitly provided.

In `@cmd/ralphex/main.go`:
- Around line 1380-1382: The warning logic is inverted: it warns when
external_review_tool is the embedded default (cfg.ExternalReviewTool set but
o.externalReviewToolSet false). Change the condition so the message is emitted
only when the user explicitly set external_review_tool and it is being
overridden; i.e., replace the current check (cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet) with a check that
requires o.externalReviewToolSet to be true (for example:
o.externalReviewToolSet && cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call
using cfg.ExternalReviewTool.

In `@pkg/processor/runner_test.go`:
- Around line 4374-4379: setIsolatedHome currently only sets HOME; on Windows
os.UserHomeDir() will read USERPROFILE (or HOMEDRIVE+HOMEPATH), so update
setIsolatedHome to also set USERPROFILE to the same t.TempDir() and populate
HOMEDRIVE and HOMEPATH so UserHomeDir resolves to the temp dir on Windows; keep
the existing call to processor.ResetClaudeMdHintOnceForTest(). Ensure you use
t.Setenv for USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across
platforms.

In `@pkg/status/section_test.go`:
- Around line 9-24: Convert the current hardcoded subtests in
TestReviewSections_TypeAndLabelContract into a table-driven test: create a slice
of test cases with fields (name, constructor args, expected Type, expected
Iteration, expected Label and any NotContains checks), iterate over them with
t.Run(tc.name, func(t *testing.T) { ... }), and for each case call the
appropriate constructor (NewClaudeReviewSection or NewInternalReviewSection) and
assert expectations against section.Type, section.Iteration and section.Label
(including the NotContains check for "codex"); keep the test name
TestReviewSections_TypeAndLabelContract and reuse the symbols
NewClaudeReviewSection, NewInternalReviewSection and SectionClaudeReview so
assertions remain the same.

---

Nitpick comments:
In `@CLAUDE.md`:
- Line 74: Split the long single paragraph describing the --codex behavior into
2–3 focused paragraphs: first explain the executor switch semantics (the --codex
flag sets cfg.Executor = config.ExecutorCodex, forces cfg.ExternalReviewTool =
"none", preserves the pipeline Mode enum, and maps Executors struct role fields
Task/Review/External/Custom to the underlying CodexExecutor for each phase);
second describe multi-agent configuration and agent registration (explain that
codexConfigOpts.cliArgs() in pkg/executor/codex.go adds additive -c overrides
like features.multi_agent and agents.reviewer.description on top of
~/.codex/config.toml so user model/sandbox/MCP settings are preserved); third
cover PassClaudeMd behavior and CLAUDE.md hints (document that --pass-claude-md
sets CodexExecutor.PassClaudeMd = true to pass project CLAUDE.md via -c
project_doc_fallback_filenames and that the runner prints a one-time hint for
user-level CLAUDE.md symlink without writing to ~/.codex).

In `@cmd/ralphex/main_test.go`:
- Around line 961-1152: Replace the many duplicated t.Run blocks in
TestApplyCodexOverrides_PostMergeMutexChecks and TestCodexFlag_ApplyCLIOverrides
with table-driven loops: define a slice of test cases (fields: name, cliArgs
[]string, initial cfg *config.Config, wantErr bool, wantErrContains []string,
wantWarnContains []string, wantCfgChecks funcs or expected field values) and
iterate with t.Run(name, func(t *testing.T){ setup common vars (o :=
parseTestOpts(t, tc.cliArgs...), var warnBuf bytes.Buffer), call
applyCodexOverrides or applyCLIOverrides as appropriate, then assert expecteds
(require.Error/NoError, strings.Contains on err.Error, warnBuf.String(), and cfg
fields like Executor, ExternalReviewTool, PassClaudeMd) to eliminate copy/paste;
keep helper calls parseTestOpts, applyCodexOverrides, applyCLIOverrides, and
warnBuf usage from the existing tests to preserve behavior and warning checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c0f7f1d1-b825-4663-9572-cf551d721c72

📥 Commits

Reviewing files that changed from the base of the PR and between 5d07e2b and d9f6539.

📒 Files selected for processing (30)
  • CLAUDE.md
  • README.md
  • cmd/ralphex/main.go
  • cmd/ralphex/main_test.go
  • docs/custom-providers.md
  • docs/plans/20260514-codex-first-class-mode.md
  • llms.txt
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/config/defaults/config
  • pkg/config/defaults/prompts/review_first_codex.txt
  • pkg/config/defaults/prompts/review_second_codex.txt
  • pkg/config/defaults_test.go
  • pkg/config/prompts.go
  • pkg/config/values.go
  • pkg/config/values_test.go
  • pkg/executor/codex.go
  • pkg/executor/codex_test.go
  • pkg/executor/executor_test.go
  • pkg/git/service_test.go
  • pkg/processor/export_test.go
  • pkg/processor/prompts.go
  • pkg/processor/prompts_test.go
  • pkg/processor/runner.go
  • pkg/processor/runner_test.go
  • pkg/processor/signals_test.go
  • pkg/status/section.go
  • pkg/status/section_test.go
  • pkg/web/server_test.go
  • pkg/web/session_progress_test.go
💤 Files with no reviewable changes (2)
  • pkg/web/server_test.go
  • pkg/git/service_test.go

Comment thread cmd/ralphex/main_test.go Outdated
Comment on lines +1101 to +1103
// CLI validation would reject this combo via validateFlags, but applyCLIOverrides
// is called after validateFlags; this guards the no-warning branch when user
// explicitly set the flag (which would have been a hard error pre-applyCLIOverrides).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale behavior comment in this test block.

Line 1101 says this combo is rejected by validateFlags, but the new TestValidateFlags cases intentionally accept these combinations at CLI stage. This comment now misstates actual behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ralphex/main_test.go` around lines 1101 - 1103, Update the stale test
comment to reflect that validateFlags now accepts these flag combinations in
TestValidateFlags rather than rejecting them; clarify that applyCLIOverrides is
still called after validateFlags and the guard exists to cover the no-warning
branch when a user explicitly set the flag (which used to be a hard error
pre-applyCLIOverrides). Replace the existing text referencing rejection by
validateFlags with a line stating that TestValidateFlags intentionally accepts
the combo at CLI stage and that applyCLIOverrides ensures the no-warning branch
when the flag was explicitly provided.

Comment thread cmd/ralphex/main.go Outdated
Comment on lines +1380 to +1382
if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet {
fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Suppress misleading codex-mode warning on defaults.

At Line 1380, this warning fires for the embedded default external_review_tool = "codex", so codex mode can print a false “config-file overridden” warning even when the user didn’t set a conflicting value.

💡 Suggested fix
-	if cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" && !o.externalReviewToolSet {
+	if cfg.ExternalReviewTool == "custom" && !o.externalReviewToolSet {
 		fmt.Fprintf(warnW, "warning: config-file external_review_tool=%q overridden to \"none\" because executor=codex\n", cfg.ExternalReviewTool)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ralphex/main.go` around lines 1380 - 1382, The warning logic is inverted:
it warns when external_review_tool is the embedded default
(cfg.ExternalReviewTool set but o.externalReviewToolSet false). Change the
condition so the message is emitted only when the user explicitly set
external_review_tool and it is being overridden; i.e., replace the current check
(cfg.ExternalReviewTool != "" && cfg.ExternalReviewTool != "none" &&
!o.externalReviewToolSet) with a check that requires o.externalReviewToolSet to
be true (for example: o.externalReviewToolSet && cfg.ExternalReviewTool != "" &&
cfg.ExternalReviewTool != "none") and keep the same fmt.Fprintf(warnW, ...) call
using cfg.ExternalReviewTool.

Comment on lines +4374 to +4379
func setIsolatedHome(t *testing.T) string {
t.Helper()
home := t.TempDir()
t.Setenv("HOME", home)
processor.ResetClaudeMdHintOnceForTest()
return home
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Go's os.UserHomeDirdocumentation, which environment variables are used on Windows, and does setting onlyHOME isolate the returned home directory there?

💡 Result:

According to the official Go documentation, the os.UserHomeDir function returns the %USERPROFILE% environment variable when running on Windows [1][2]. Setting only the HOME environment variable on Windows does not isolate or influence the directory returned by os.UserHomeDir, as the function specifically looks for %USERPROFILE% on that platform [1][2]. While some older or third-party implementations of home directory detection historically combined %HOMEDRIVE% and %HOMEPATH% or checked other variables, the standard library's os.UserHomeDir implementation is explicitly documented to rely on %USERPROFILE% [1][2].

Citations:


🏁 Script executed:

#!/bin/bash

# Search for os.UserHomeDir calls in processor code
echo "=== Searching for UserHomeDir calls ==="
rg "UserHomeDir|user\.HomeDir|homedir" pkg/processor --type go

# Search for environment variable usage in home/path resolution
echo ""
echo "=== Searching for USERPROFILE/HOMEDRIVE/HOMEPATH usage ==="
rg "USERPROFILE|HOMEDRIVE|HOMEPATH" pkg/processor --type go

# Look for hint path resolution logic
echo ""
echo "=== Searching for CLAUDE.md or hint path logic ==="
rg "CLAUDE\.md|hint.*path|hint.*home" pkg/processor --type go -i

Repository: umputun/ralphex

Length of output: 2181


🏁 Script executed:

#!/bin/bash

# Check what the setIsolatedHome helper is actually used for
echo "=== Usages of setIsolatedHome ==="
rg "setIsolatedHome" pkg/processor/runner_test.go -A 2 -B 2

# Look at the hint tests that use this helper
echo ""
echo "=== Context around hint tests (lines 4382-4410) ==="
sed -n '4382,4410p' pkg/processor/runner_test.go

Repository: umputun/ralphex

Length of output: 3079


🏁 Script executed:

#!/bin/bash

# Get context around the UserHomeDir call
echo "=== Context of os.UserHomeDir() call ==="
rg "os\.UserHomeDir" pkg/processor/runner.go -B 5 -A 5

# Check the maybeEmitClaudeMdSetupHint function signature
echo ""
echo "=== maybeEmitClaudeMdSetupHint function ==="
rg "func.*maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -A 15

Repository: umputun/ralphex

Length of output: 1133


🏁 Script executed:

#!/bin/bash

# Verify this function is called during test execution
echo "=== Searching for maybeEmitClaudeMdSetupHint calls ==="
rg "maybeEmitClaudeMdSetupHint" pkg/processor --type go

# Check if processor.New() calls it
echo ""
echo "=== Context of processor.New() ==="
rg "func \(r \*Runner\) New\(" pkg/processor/runner.go -A 20 | head -40

Repository: umputun/ralphex

Length of output: 449


🏁 Script executed:

#!/bin/bash

# Find where maybeEmitClaudeMdSetupHint is called in runner.go
echo "=== Calls to maybeEmitClaudeMdSetupHint in runner.go ==="
rg "maybeEmitClaudeMdSetupHint" pkg/processor/runner.go -B 10 -A 2

# Check if it's called during New() or similar initialization
echo ""
echo "=== Check New() function for hint call ==="
rg "func \(.*\) New\(" pkg/processor/runner.go -A 40 | head -60

Repository: umputun/ralphex

Length of output: 1375


Isolate home-directory lookups on Windows too.

Line 4377 only overrides HOME. On Windows, os.UserHomeDir() returns %USERPROFILE%, not HOME, so these tests will access the real user home directory instead of t.TempDir() on Windows, violating the guideline that tests MUST use t.TempDir() and NEVER touch ~/.config/ralphex/. Set the Windows environment variables to ensure consistent test isolation across platforms.

🔧 Minimal hardening
 func setIsolatedHome(t *testing.T) string {
 	t.Helper()
 	home := t.TempDir()
 	t.Setenv("HOME", home)
+	t.Setenv("USERPROFILE", home)
+	t.Setenv("HOMEDRIVE", "")
+	t.Setenv("HOMEPATH", "")
 	processor.ResetClaudeMdHintOnceForTest()
 	return home
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/processor/runner_test.go` around lines 4374 - 4379, setIsolatedHome
currently only sets HOME; on Windows os.UserHomeDir() will read USERPROFILE (or
HOMEDRIVE+HOMEPATH), so update setIsolatedHome to also set USERPROFILE to the
same t.TempDir() and populate HOMEDRIVE and HOMEPATH so UserHomeDir resolves to
the temp dir on Windows; keep the existing call to
processor.ResetClaudeMdHintOnceForTest(). Ensure you use t.Setenv for
USERPROFILE, HOMEDRIVE and HOMEPATH so tests remain isolated across platforms.

Comment on lines +9 to +24
func TestReviewSections_TypeAndLabelContract(t *testing.T) {
t.Run("claude first pass review", func(t *testing.T) {
section := NewClaudeReviewSection(0, ": all findings")
assert.Equal(t, SectionClaudeReview, section.Type)
assert.Equal(t, 0, section.Iteration)
assert.Equal(t, "claude review 0: all findings", section.Label)
})

t.Run("internal codex review", func(t *testing.T) {
section := NewInternalReviewSection(3, ": critical/major")
assert.Equal(t, SectionClaudeReview, section.Type)
assert.Equal(t, 3, section.Iteration)
assert.Equal(t, "review 3: critical/major", section.Label)
assert.NotContains(t, section.Label, "codex")
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Convert this test to a table-driven form.

Line 9 currently uses hardcoded subtests; project test conventions require table-driven structure for consistency and easier extension.

♻️ Proposed refactor
 func TestReviewSections_TypeAndLabelContract(t *testing.T) {
-	t.Run("claude first pass review", func(t *testing.T) {
-		section := NewClaudeReviewSection(0, ": all findings")
-		assert.Equal(t, SectionClaudeReview, section.Type)
-		assert.Equal(t, 0, section.Iteration)
-		assert.Equal(t, "claude review 0: all findings", section.Label)
-	})
-
-	t.Run("internal codex review", func(t *testing.T) {
-		section := NewInternalReviewSection(3, ": critical/major")
-		assert.Equal(t, SectionClaudeReview, section.Type)
-		assert.Equal(t, 3, section.Iteration)
-		assert.Equal(t, "review 3: critical/major", section.Label)
-		assert.NotContains(t, section.Label, "codex")
-	})
+	tests := []struct {
+		name          string
+		section       Section
+		wantType      SectionType
+		wantIteration int
+		wantLabel     string
+		noCodex       bool
+	}{
+		{
+			name:          "claude first pass review",
+			section:       NewClaudeReviewSection(0, ": all findings"),
+			wantType:      SectionClaudeReview,
+			wantIteration: 0,
+			wantLabel:     "claude review 0: all findings",
+		},
+		{
+			name:          "internal codex review",
+			section:       NewInternalReviewSection(3, ": critical/major"),
+			wantType:      SectionClaudeReview,
+			wantIteration: 3,
+			wantLabel:     "review 3: critical/major",
+			noCodex:       true,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			assert.Equal(t, tc.wantType, tc.section.Type)
+			assert.Equal(t, tc.wantIteration, tc.section.Iteration)
+			assert.Equal(t, tc.wantLabel, tc.section.Label)
+			if tc.noCodex {
+				assert.NotContains(t, tc.section.Label, "codex")
+			}
+		})
+	}
 }

As per coding guidelines "**/*_test.go: Table-driven tests with testify".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestReviewSections_TypeAndLabelContract(t *testing.T) {
t.Run("claude first pass review", func(t *testing.T) {
section := NewClaudeReviewSection(0, ": all findings")
assert.Equal(t, SectionClaudeReview, section.Type)
assert.Equal(t, 0, section.Iteration)
assert.Equal(t, "claude review 0: all findings", section.Label)
})
t.Run("internal codex review", func(t *testing.T) {
section := NewInternalReviewSection(3, ": critical/major")
assert.Equal(t, SectionClaudeReview, section.Type)
assert.Equal(t, 3, section.Iteration)
assert.Equal(t, "review 3: critical/major", section.Label)
assert.NotContains(t, section.Label, "codex")
})
}
func TestReviewSections_TypeAndLabelContract(t *testing.T) {
tests := []struct {
name string
section Section
wantType SectionType
wantIteration int
wantLabel string
noCodex bool
}{
{
name: "claude first pass review",
section: NewClaudeReviewSection(0, ": all findings"),
wantType: SectionClaudeReview,
wantIteration: 0,
wantLabel: "claude review 0: all findings",
},
{
name: "internal codex review",
section: NewInternalReviewSection(3, ": critical/major"),
wantType: SectionClaudeReview,
wantIteration: 3,
wantLabel: "review 3: critical/major",
noCodex: true,
},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
assert.Equal(t, tc.wantType, tc.section.Type)
assert.Equal(t, tc.wantIteration, tc.section.Iteration)
assert.Equal(t, tc.wantLabel, tc.section.Label)
if tc.noCodex {
assert.NotContains(t, tc.section.Label, "codex")
}
})
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/status/section_test.go` around lines 9 - 24, Convert the current
hardcoded subtests in TestReviewSections_TypeAndLabelContract into a
table-driven test: create a slice of test cases with fields (name, constructor
args, expected Type, expected Iteration, expected Label and any NotContains
checks), iterate over them with t.Run(tc.name, func(t *testing.T) { ... }), and
for each case call the appropriate constructor (NewClaudeReviewSection or
NewInternalReviewSection) and assert expectations against section.Type,
section.Iteration and section.Label (including the NotContains check for
"codex"); keep the test name TestReviewSections_TypeAndLabelContract and reuse
the symbols NewClaudeReviewSection, NewInternalReviewSection and
SectionClaudeReview so assertions remain the same.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/plans/20260514-codex-first-class-mode.md`:
- Line 220: Add a language identifier to the fenced code block opened with
triple backticks so MD040 warnings stop; locate the anonymous code fence (the
``` block shown in the diff) and change the opening fence from ``` to ```text
(or another appropriate language) so the block becomes ```text ... ``` as in the
proposed fix.

In `@llms.txt`:
- Around line 164-165: Update the prompt filename list section in llms.txt to
include the codex-specific filenames so it matches the "codex split" docs: add
review_first_codex.txt and review_second_codex.txt alongside the existing
review_first.txt/review_second.txt entries and keep the stated loading priority
(local → global → embedded) intact; ensure the explanatory text clarifies that
customizing review_first.txt does not affect --codex, and users must create the
review_first_codex.txt/review_second_codex.txt variants for codex.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 79edc3e5-d145-49b1-a905-91148cc99cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 3812540 and 853382b.

📒 Files selected for processing (16)
  • CLAUDE.md
  • README.md
  • cmd/ralphex/main.go
  • docs/custom-providers.md
  • docs/plans/20260514-codex-first-class-mode.md
  • llms.txt
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/config/defaults/agents/simplification.txt
  • pkg/config/defaults/config
  • pkg/config/values.go
  • pkg/config/values_test.go
  • pkg/executor/codex.go
  • pkg/executor/codex_test.go
  • pkg/processor/runner.go
  • pkg/processor/runner_test.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/config/defaults/agents/simplification.txt
  • README.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • CLAUDE.md
  • docs/custom-providers.md
  • pkg/config/values_test.go
  • pkg/config/defaults/config
  • pkg/config/config_test.go
  • pkg/executor/codex_test.go
  • cmd/ralphex/main.go
  • pkg/executor/codex.go
  • pkg/processor/runner.go
  • pkg/processor/runner_test.go


**Codex `-c` flag args** (built per invocation by `codexConfigOpts.cliArgs()`):

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

Line 220 opens a fenced block without a language, which triggers MD040 warnings.

📝 Proposed fix
-```
+```text
 review phases:  -c features.multi_agent=true \
                 -c agents.reviewer.description="general code review specialist; behavior driven by the task argument"

 + when --pass-claude-md:  -c project_doc_fallback_filenames=["CLAUDE.md"]

 task / finalize phases:  (no overrides — single-agent, default config)
 + when --pass-claude-md:  -c project_doc_fallback_filenames=["CLAUDE.md"]
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 220-220: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/plans/20260514-codex-first-class-mode.md at line 220, Add a language
identifier to the fenced code block opened with triple backticks so MD040
warnings stop; locate the anonymous code fence (the block shown in the diff) and change the opening fence from to text (or another appropriate language) so the block becomes text ... ``` as in the proposed fix.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread llms.txt Outdated
Comment on lines +164 to +165
Prompt customization split: customizing `~/.config/ralphex/prompts/review_first.txt` does NOT apply under `--codex` because the codex path reads `review_first_codex.txt`. Users who want a customized codex review prompt need to create `review_first_codex.txt` (and `review_second_codex.txt`) alongside the claude variants. Same loading priority: local → global → embedded.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the prompt filename list consistent with the codex split docs.

The new split explanation is good, but the prompt list on Line 117 should also include review_first_codex.txt and review_second_codex.txt to avoid conflicting guidance.

📝 Proposed fix (outside this hunk)
-**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt`
+**Prompt files** (`~/.config/ralphex/prompts/`): `task.txt`, `review_first.txt`, `review_second.txt`, `review_first_codex.txt`, `review_second_codex.txt`, `codex.txt`, `codex_review.txt`, `custom_review.txt`, `custom_eval.txt`, `make_plan.txt`, `finalize.txt`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@llms.txt` around lines 164 - 165, Update the prompt filename list section in
llms.txt to include the codex-specific filenames so it matches the "codex split"
docs: add review_first_codex.txt and review_second_codex.txt alongside the
existing review_first.txt/review_second.txt entries and keep the stated loading
priority (local → global → embedded) intact; ensure the explanatory text
clarifies that customizing review_first.txt does not affect --codex, and users
must create the review_first_codex.txt/review_second_codex.txt variants for
codex.

umputun added a commit that referenced this pull request May 18, 2026
- idle-timeout silent review skip: external review loop now retries
  on r.lastSessionTimedOut instead of treating partial/empty output as
  a clean exit; --idle-timeout no longer bypasses external codex review
- preserve user codex model and reasoning effort: emit -c model= /
  -c model_reasoning_effort= only when ralphex config explicitly sets
  them, so ~/.codex/config.toml choices are honored when ralphex does
  not override (matches the docs claim)
- spurious external_review_tool warning: add ExternalReviewToolSet
  sentinel, gate the codex-override warning on user-explicit setting
  so vanilla --codex runs no longer warn about the embedded default
- finalize-codex routing regression test: pin that finalize calls
  task.Run, not review.Run, under --codex mode
- remove dead code under --codex: New() no longer builds unused
  claude/external executors when isCodexExecutor() is true
- escapeCodexSingleQuoted: add \t escape; document supported set
- revert unrelated simplification.txt wording change
umputun added a commit that referenced this pull request May 18, 2026
- main_test.go:1103: replace stale comment about validateFlags
  rejecting --codex+--external-review-tool combo (TestValidateFlags
  now accepts it at the CLI stage)
- runner_test.go: setIsolatedHome also sets USERPROFILE, HOMEDRIVE,
  HOMEPATH so os.UserHomeDir() on Windows resolves to the temp dir
  (test-safety rule: never touch ~/.config/ralphex/)
- section_test.go: convert to table-driven form per project convention
- plan file:220: add text fence language identifier (MD040)
- llms.txt: add review_first_codex.txt / review_second_codex.txt to
  the prompt filename list with a note about the codex split
- prompts.go: warn once per agent name when codex mode discards
  frontmatter Model/AgentType overrides, so users do not silently
  lose per-agent settings under --codex
@umputun umputun requested a review from Copilot May 18, 2026 18:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 5 comments.

Comment thread pkg/executor/codex.go
Comment on lines +226 to +247
sessionIDCh := make(chan string, 1)
stderrDone := make(chan stderrResult, 1)
go func() {
stderrDone <- e.processStderr(ctx, streams.Stderr)
stderrDone <- e.processStderr(execCtx, streams.Stderr, idleTouch, sessionIDCh, firstRun)
}()

// read stdout entirely as final response
stdoutContent, stdoutErr := e.readStdout(streams.Stdout)
tailCancel, tailDone := e.startRolloutTail(execCtx, sessionIDCh, idleTouch)

// read stdout entirely as final response; wrap with touch-on-read so reads
// keep the idle timer alive even while stderr is quiet.
stdoutReader := streams.Stdout
if e.IdleTimeout > 0 {
stdoutReader = &touchReader{r: streams.Stdout, touch: idleTouch}
}
stdoutContent, stdoutErr := e.readStdout(stdoutReader)

// wait for stderr processing to complete
stderrRes := <-stderrDone

// codex has exited; signal tailer to drain remaining file content and stop
tailCancel()
<-tailDone
Comment thread pkg/executor/codex.go Outdated
Comment on lines +317 to +322
// idleTimeoutResult builds the Result returned when the idle-timeout timer
// canceled the derived execution context (parent ctx still alive). limit and
// logDroppedIdleErrors surfaces concurrent stream/wait errors that would otherwise
// be discarded by the idle-timeout completion path. operators need this to
// distinguish "agent went silent" from "stream broke" before retrying.
func (e *CodexExecutor) logDroppedIdleErrors(stdoutErr, waitErr error) {
Comment thread pkg/executor/codex.go Outdated
Comment on lines +786 to +791
// firstLine returns the first newline-delimited segment of s, or s when it
// contains no newline. used to keep rollout event display lines compact.
func (e *CodexExecutor) firstLine(s string) string {
head, _, _ := strings.Cut(s, "\n")
return head
}
Comment thread pkg/executor/codex.go
Comment on lines +733 to +742
type rolloutPayload struct {
Type string `json:"type"`
Role string `json:"role"`
Name string `json:"name"`
Arguments string `json:"arguments"`
Content []struct {
Type string `json:"type"`
Text string `json:"text"`
} `json:"content"`
}
Comment thread pkg/processor/runner.go Outdated
Comment on lines 662 to 666
return errors.New("first review pass timed out")
}

if !isReviewDone(result.Signal) {
r.log.Print("warning: first review pass did not complete cleanly, continuing...")
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment thread pkg/progress/progress.go Outdated
Comment on lines 625 to 642
// writeFile and writeStdout each take l.writeMu so callers that emit a
// timestamped log line via writeFile + writeStdout (back-to-back inside the
// same public method) do not interleave on either sink under concurrent
// producers — codex stderr processor and rollout-file tailer both call
// OutputHandler -> Print on the same Logger.
func (l *Logger) writeFile(format string, args ...any) {
l.writeMu.Lock()
defer l.writeMu.Unlock()
if l.file != nil {
fmt.Fprintf(l.file, format, args...)
}
}

func (l *Logger) writeStdout(format string, args ...any) {
l.writeMu.Lock()
defer l.writeMu.Unlock()
fmt.Fprintf(l.stdout, format, args...)
}
umputun added 4 commits May 18, 2026 17:22
Adds --codex CLI flag and matching `executor = codex` config option that
runs the full pipeline (task, internal review, finalize) through the codex
CLI instead of Claude Code. Motivated by the June 2026 Anthropic billing
split that moves unattended ralphex runs off the Claude Max plan; gives
OpenAI/codex-plan users a single flag to keep the full pipeline on their
existing subscription.

Highlights:
- New `--codex` and `--pass-claude-md` CLI flags plus matching `executor`
  and `pass_claude_md` config keys, with mutex validation against
  `--external-only`, `--codex-only`, and `--external-review-tool != none`.
- Role-named executor wiring: `Executors` struct switches from
  `Claude`/`ReviewClaude`/`Codex` to `Task`/`Review`/`External`/`Custom`.
  Under `--codex` the constructor wires the same shared `CodexExecutor`
  instance into both `Task` and `Review` so any phase can use
  `{{agent:...}}` expansions.
- `CodexExecutor` gains `MultiAgent`, `PassClaudeMd`, `IdleTimeout`,
  `Sandbox`-with-mode-aware-default, additive `-c` flag overrides, an
  idle-timeout machinery via `touchReader` + `time.AfterFunc`, and
  `finalError` extraction so non-zero exits surface a useful diagnostic.
- Shared review prompts: `{{agent:<name>}}` expander reads
  `cfg.AppConfig.Executor` and emits the executor-appropriate invocation
  (Task tool for claude, `spawn_agent(agent='reviewer', task='...')`
  for codex). No separate codex-only prompt files are needed.
- Mode-aware sandbox: `read-only` for external codex review (default
  claude mode); `danger-full-access` for first-class `executor = codex`
  (task/review/finalize need to write git metadata and commit) via the
  new `(*config.Config).CodexExecutorSandbox()` accessor.
- `--pass-claude-md` adds `-c project_doc_fallback_filenames=["CLAUDE.md"]`
  so codex's native AGENTS.md walk picks up project-level `./CLAUDE.md`.
  Valid only under the codex executor; rejected for claude mode.
- Codex error/limit patterns extended with stderr live scanning gated on
  `error:`/`fatal:`/`panic:` prefix so progress chatter cannot trip
  false positives; stdout-first / stderr-second priority within each
  severity class.
- Documentation: README codex-mode section, llms.txt mirror, CLAUDE.md
  project-doc notes, `docs/custom-providers.md` deprecation callouts,
  implementation plan at `docs/plans/completed/20260514-codex-first-class-mode.md`.

Internal-review section: `NewInternalReviewSection` returns
`SectionInternalReview` with an executor-neutral label to avoid
`phaseFromSection` misrouting under codex.
…on banner

Three coordinated changes make --codex executor mode produce rich per-line
output instead of just codex's reasoning-summary headers.

Rollout tailer (pkg/executor/codex.go):
- Codex has no stream-json equivalent for claude's structured output, so the
  model's assistant message text only lands in stdout at end-of-session and
  was never streamed.
- Run() now extracts the session id from stderr's header banner via
  extractSessionID + a buffered sessionIDCh, then spawns tailRolloutFile to
  follow ~/.codex/sessions/<y>/<m>/<d>/rollout-*-<session-id>.jsonl like
  tail -f. formatRolloutEvent forwards assistant message text only;
  function_call records (exec_command, spawn_agent) and reasoning records
  are dropped as tool-machinery noise the narrative already covers.
- tailCtx is canceled after stdout EOF so the tailer drains remaining
  buffered events once more and exits without blocking shutdown.

Banner trim with one-time leak (pkg/executor/codex.go):
- shouldDisplay no longer forwards codex's 8-line startup header
  (workdir/provider/approval/session-id/etc.) per task/review/finalize
  phase, matching claude's empty-banner UX.
- On the very first Run() across the executor's lifetime
  (headerEmitted atomic.Bool via CompareAndSwap) it whitelists model:,
  sandbox:, and reasoning effort: from the header block so users see what
  codex resolved from ~/.codex/config.toml. Subsequent invocations get full
  suppression. Session id detection runs independently of display so the
  rollout tailer keeps working either way.

No-fork-context hint (pkg/processor/prompts.go):
- formatAgentExpansionCodex appends "Pass only agent and task; do not set
  fork_context." to the spawn_agent example. Codex was auto-pairing
  fork_context=true with explicit agent_type and getting rejected, then
  retrying without — costing ~5 wasted spawn calls per review iteration.

Startup banner (cmd/ralphex/main.go):
- printExecutorInfo emits sandbox: under "executor: codex" using the
  resolved CodexExecutorSandbox() value, plus model: / reasoning effort:
  when --task-model explicitly set them. Gives a clear executor summary
  before any codex output.

Tests cover rollout tailing initial-drain + late-append, formatRolloutEvent
JSONL parsing, firstRun lifecycle across multiple Run() calls, session id
regex, and the updated suppress-by-default shouldDisplay contract.
…w prompts

Adds prependCodexReviewGuidance + codexReviewGuidance that prepends a
"=== Codex orchestration directives ===" block to ReviewFirstPrompt and
ReviewSecondPrompt at build time when cfg.isCodexExecutor() is true. Covers
two codex multi_agent quirks observed end-to-end on the toy project:

- spawn_agent: pass only agent and task arguments. codex rejects
  fork_context=true paired with explicit agent_type, costing ~5 wasted
  retry round-trips per review iteration.
- wait_agent: when timed_out=true with empty status (sub-agent died
  mid-tool-call), re-spawn the missing agents ONCE and proceed with
  partial results. Without this directive codex sits through four 10-min
  wait windows (~40 min) before deciding to retry on its own.

Section-level injection works for users on embedded defaults AND users
with customized review prompts that hard-code agent lists inline (bypassing
the {{agent:NAME}} expander). Wired at three review-prompt build sites in
pkg/processor/runner.go: full pipeline first review, review-only mode first
review, and the review-second loop.

formatAgentExpansionCodex no longer emits a per-agent fork_context hint
since the section-level block is now the single source of truth.

Also fixes a -race test failure from commit 582f75c. The tailer goroutine
appends to a test-side []string concurrently with the main goroutine
reading len(); now guarded with sync.Mutex + snapshot helper. Production
OutputHandler writes through file descriptors only (no shared Go data
structures), so this affected only the test.

Measured: same toy project, --codex --pass-claude-md, review 1 dropped
from 41m30s to 1m22s; total wall-clock from ~49 min to ~7 min.
Code-review-plus + codex GPT-5 + Copilot identified 17 places where the
first-class --codex work silently altered behavior for users running without
--codex. Each change here scopes new behavior to the opt-in --codex path and
preserves master semantics for the default-claude pipeline.

Critical
- runReview no longer aborts the run on first-review session/idle timeout
  for default-claude users (master logged a soft warning and continued).
  Gated on cfg.isCodexExecutor() — only --codex treats the timeout as fatal
  (load-bearing under --codex because external review is auto-skipped).
  Phase label parameterized so the warning/error text is accurate if the
  helper is reused.
- progress.Logger writes are now serialized through l.writeMu held across
  each public method's full file + stdout pair (writeTimestamped, PrintRaw,
  PrintSection, PrintAligned per-line, LogQuestion, LogDraftReview, Close).
  processStderr and the new tailRolloutFile both call OutputHandler from
  separate goroutines; without the mutex a concurrent producer could slip
  between a method's two sinks and split related output (file gets producer
  A while stdout shows B, or LogQuestion's QUESTION line gets separated
  from its companion OPTIONS line). writeFile/writeStdout renamed to
  *Locked to make the contract explicit.

High (backward-compat regressions)
- execCodexRunner.childEnv now scopes ANTHROPIC_API_KEY stripping to
  first-class --codex via a stripAnthropicKey flag wired to MultiAgent.
  External codex review in default-claude mode preserves the host env so
  custom codex wrappers proxying through Anthropic (scripts/codex-as-claude.sh)
  keep authenticating. CLAUDECODE is always stripped.
- codex_model = gpt-5.5 and codex_reasoning_effort = xhigh restored as
  active embedded defaults; commenting out the lines is the user opt-in
  for inheriting from ~/.codex/config.toml. Previous commented-out defaults
  silently changed review quality/cost for new installs.
- IdleTimeout removed from newBaseCodexExecutor; buildCodexExecutor (first-
  class --codex) wires it. External codex review in claude mode no longer
  inherits the claude idle_timeout, preserving master semantics where
  codex review sessions were not idle-killed.

Medium
- --dangerously-bypass-approvals-and-sandbox gated on e.MultiAgent so
  external codex review in Docker mode (sandbox forced to danger-full-access)
  does not silently skip approval prompts that used to surface.
- review_first.txt / review_second.txt restored claude-specific guardrails
  ("all agent invocations in a single message", "Do NOT use
  run_in_background", "do NOT paste diffs into agent prompts"). Phrased
  inclusively so both executors get useful guidance.

Low / cleanup
- Deleted dead firstLine method and its test (no production callers).
- Swapped misattributed godoc on logDroppedIdleErrors / idleTimeoutResult.
- Dropped unread rolloutPayload Name and Arguments fields (formatRolloutEvent
  only renders assistant messages).
- runReview phase label parameterized (no more hard-coded "first review
  pass" in error string).
- Renamed SectionClaudeReview to SectionInternalReview (covers both claude
  and codex internal review iterations).

Docs / banner
- Startup banner under --codex now reads cfg.AppConfig.CodexModel and
  CodexReasoningEffort directly (what the executor actually receives),
  not TaskModel (which only feeds claude). Fix applied at both executePlan
  and runPlanMode banner sites.
- docs/custom-providers.md no longer references the removed
  review_first_codex.txt / review_second_codex.txt files (those were
  removed during the earlier prompt-duality collapse); replaced with
  description of the shared review prompts + prependCodexReviewGuidance
  section-level injection.
- llms.txt + README + CLAUDE.md document the new mode-specific scoping
  for codex_sandbox, codex_model, idle_timeout, and ANTHROPIC_API_KEY.
- Dropped self-contradictory "silently … printing a warning" phrasing in
  README + llms.txt.

Tests pass under `go test -race ./...`; linter clean.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

umputun added 2 commits May 18, 2026 17:54
ExecutorClaude = "" was the convention the surrounding code relied on
(unset config field → claude pipeline), but at a glance the empty literal
looked like a placeholder. Added a comment block explaining that an unset
executor field resolves to claude by design and ExecutorCodex is the
opt-in first-class --codex path.

Revdiff annotation on master diff (pkg/config/config.go:32) flagged this
as confusing — fixing in place.
gofmt-only cleanup; one blank line left over from the firstLine method
deletion was not normalized away during the rebase. No behavior change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants