Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions config/agent-templates/test-codex/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Test Codex Agent

You are a test agent running on **OpenAI's Codex CLI** (`codex exec`), Trinity's
third agent runtime alongside Claude Code and Gemini.

> Trinity mirrors this file to `AGENTS.md` at startup — Codex reads `AGENTS.md`,
> not `CLAUDE.md`.

## Your Purpose

Validate that Trinity's Codex runtime works correctly:
- Codex CLI integration (`codex exec --json`)
- The `-o` durable result record (authoritative response)
- MCP tool access (Trinity MCP wired via `config.toml`)
- Cost tracking (estimated from `turn.completed.usage` tokens)
- Chat continuity (`codex exec resume <thread_id>`)
- Sandbox safety (`danger-full-access` — the Trinity container is the boundary; `read-only` when the agent is read-only)

## Key Differences from Claude Code

1. **Instructions file:** You read `AGENTS.md` (Trinity mirrors `CLAUDE.md` → `AGENTS.md`).
2. **Cost:** No native cost field — Trinity estimates it from token counts.
3. **Sandbox:** You run under `--sandbox danger-full-access`, which disables Codex's
own inner (bubblewrap) sandbox — the hardened Trinity container (`cap_drop ALL`,
`no-new-privileges`, AppArmor) is the security boundary, the same posture Claude
and Gemini run under. A read-only agent runs `--sandbox read-only`.
4. **Provider:** OpenAI (not Anthropic).
5. **Session tab:** Not available for Codex agents — use the **Chat** tab (continuity
is wired there). The Session tab's cached-UUID `--resume` model is Claude-specific.

## Authentication

Codex authenticates with `OPENAI_API_KEY`, injected via the agent's `.env`
(Quick Inject → `OPENAI_API_KEY`). Codex agents are not assigned a Claude
subscription.

## Testing Commands

When asked to test, verify:
- `/test` — basic functionality
- Tool calling works (shell commands, web search)
- MCP servers are accessible
- Cost / token tracking reports correctly

Report any differences in behavior compared to Claude Code agents.
34 changes: 34 additions & 0 deletions config/agent-templates/test-codex/template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: test-codex
display_name: Test Codex Agent
description: Test agent using OpenAI's Codex CLI runtime for validation (#1187)
version: "1.0.0"
author: Trinity Platform
priority: 10 # Lower = higher in list (after system templates)

type: business-assistant

# Use the OpenAI Codex runtime instead of Claude Code
runtime:
type: codex
model: gpt-5.1-codex

resources:
cpu: "2"
memory: "2g"

capabilities:
- chat
- code-generation
- web-search

mcp_servers: []

# Codex authenticates with an OpenAI API key read from the agent's .env
# (CRED-002). Declaring it here makes the Quick Inject UI prompt for it.
credentials:
env_file:
- OPENAI_API_KEY

slash_commands:
- name: /test
description: Test command to verify the Codex runtime is working
7 changes: 7 additions & 0 deletions docker/base-image/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ RUN npm install -g @anthropic-ai/claude-code@latest
# Install Gemini CLI for multi-runtime support
RUN npm install -g @google/gemini-cli

# Install OpenAI Codex CLI for multi-runtime support (#1187).
# Pinned (not @latest) so base-image rebuilds are reproducible and a
# breaking Codex release can't silently change the agent runtime. The npm
# package resolves the platform-specific Rust binary via optional deps;
# the linux-x64 build lands here. Bump deliberately after testing.
RUN npm install -g @openai/codex@0.139.0

RUN useradd -m -s /bin/bash -u 1000 developer && \
echo "developer:developer" | chpasswd && \
usermod -aG sudo developer && \
Expand Down
6 changes: 6 additions & 0 deletions docker/base-image/agent_server/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@ class ExecutionMetadata(BaseModel):
compact_events: List[CompactEvent] = [] # Auto-compact events observed mid-turn
recovered_from_jsonl: bool = False # Stdout race + JSONL fallback fired (response from disk, not stream)
model_name: Optional[str] = None # Actual model id from assistant.message.model (e.g., "claude-sonnet-4-5") — #678
# #1187: typed terminal-result seed (the #945 taxonomy). Populated by
# newer runtimes (Codex) and currently UNUSED by the backend in the MVP —
# the backend still infers AUTH from the HTTP status. A fast-follow makes
# the backend read error_code directly and retire status-inference.
status: Optional[str] = None # "success" | "error"
error_code: Optional[str] = None # "AUTH" | "RATE_LIMIT" | "TIMEOUT" | "AGENT_ERROR" | "RUNTIME_UNAVAILABLE"


# ============================================================================
Expand Down
14 changes: 13 additions & 1 deletion docker/base-image/agent_server/services/claude_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
)
from .headless_executor import _attempt_empty_result_recovery, execute_headless_task
from .process_registry import get_process_registry
from .runtime_adapter import AgentRuntime
from .runtime_adapter import AgentRuntime, RuntimeCapabilities
from .stream_parser import process_stream_line
from .subprocess_lifecycle import (
_capture_pgid,
Expand Down Expand Up @@ -73,6 +73,18 @@
class ClaudeCodeRuntime(AgentRuntime):
"""Claude Code implementation of AgentRuntime interface."""

@classmethod
def capabilities(cls) -> RuntimeCapabilities:
# Claude is the reference runtime: full continuity (--continue), the
# Session tab's cached-UUID --resume machinery, MCP, and native cost
# reporting (Claude Code emits total_cost_usd directly). (#1187)
return RuntimeCapabilities(
chat_continuity=True,
session_tab_resume=True,
mcp_support=True,
cost_reporting="native",
)

def is_available(self) -> bool:
"""Check if Claude Code CLI is installed."""
try:
Expand Down
Loading
Loading