We validated the official Langfuse Codex plugin on real Codex sessions and found a partial-working state:
- the plugin definitely works and can export real traces to Langfuse
- but automatic coverage is incomplete for some completed sessions
The cleanest reproducer is a real session that completed multiple turns, but produced neither a .langfuse sidecar nor a remote Langfuse session.
Environment
- Codex desktop/app runtime
- plugin:
tracing@codex-observability-plugin v0.1.0
- Langfuse Cloud
- plugin enabled with
TRACE_TO_LANGFUSE=true
- issue observed on real interactive Codex sessions, not on synthetic fixtures
What We Observed
Working sessions
We observed multiple real sessions where:
- rollout completed successfully
- a
.langfuse sidecar was written
- a remote Langfuse session existed
- traces were visible in Langfuse
Failing clean session
We also observed at least one real completed session where:
- rollout completed with multiple turns
- no
.langfuse sidecar
- no remote Langfuse session was created
Host-Side Evidence
- host logs show explicit Stop-hook completion lines for some session classes
- for failing sessions, we could not find corresponding completion evidence in the same hook/log surface
- local runtime evidence suggests not all Codex session paths behave the same way with respect to hook execution or hook visibility
Why This Looks Like A Runtime / Hook-Coverage Problem
- the plugin is not dead: real traces do arrive remotely for some completed sessions
- the clean failing session completed multiple turns but had no sidecar and no remote session
- host-visible Stop-hook completion evidence appears inconsistent across different session paths in this environment
Questions
- Are all completed Codex session paths expected to trigger the same Stop-hook lifecycle?
- Is incomplete sidecar coverage on completed turns a known issue?
- Is there a recommended way to distinguish:
- hook never fired
- hook fired without transcript path
- hook fired but skipped sidecar mark
- hook fired and exported partially
Current Workaround
We can reliably export the same real rollout by replaying it through the installed official plugin bundle ourselves. That works as an operational fallback, but it does not prove the host automatic Stop hook is healthy.
Impact
We currently have to treat automatic Codex session coverage as partial/unreliable on this runtime surface and use fallback replay for continuity when needed.
We validated the official Langfuse Codex plugin on real Codex sessions and found a partial-working state:
The cleanest reproducer is a real session that completed multiple turns, but produced neither a
.langfusesidecar nor a remote Langfuse session.Environment
tracing@codex-observability-pluginv0.1.0TRACE_TO_LANGFUSE=trueWhat We Observed
Working sessions
We observed multiple real sessions where:
.langfusesidecar was writtenFailing clean session
We also observed at least one real completed session where:
.langfusesidecarHost-Side Evidence
Why This Looks Like A Runtime / Hook-Coverage Problem
Questions
Current Workaround
We can reliably export the same real rollout by replaying it through the installed official plugin bundle ourselves. That works as an operational fallback, but it does not prove the host automatic Stop hook is healthy.
Impact
We currently have to treat automatic Codex session coverage as partial/unreliable on this runtime surface and use fallback replay for continuity when needed.