Skip to content

Codex plugin exports some real traces, but some completed Codex sessions produce no sidecar and no remote session #12

Description

@rilkerfc

We validated the official Langfuse Codex plugin on real Codex sessions and found a partial-working state:

  • the plugin definitely works and can export real traces to Langfuse
  • but automatic coverage is incomplete for some completed sessions

The cleanest reproducer is a real session that completed multiple turns, but produced neither a .langfuse sidecar nor a remote Langfuse session.

Environment

  • Codex desktop/app runtime
  • plugin: tracing@codex-observability-plugin v0.1.0
  • Langfuse Cloud
  • plugin enabled with TRACE_TO_LANGFUSE=true
  • issue observed on real interactive Codex sessions, not on synthetic fixtures

What We Observed

Working sessions

We observed multiple real sessions where:

  • rollout completed successfully
  • a .langfuse sidecar was written
  • a remote Langfuse session existed
  • traces were visible in Langfuse

Failing clean session

We also observed at least one real completed session where:

  • rollout completed with multiple turns
  • no .langfuse sidecar
  • no remote Langfuse session was created

Host-Side Evidence

  • host logs show explicit Stop-hook completion lines for some session classes
  • for failing sessions, we could not find corresponding completion evidence in the same hook/log surface
  • local runtime evidence suggests not all Codex session paths behave the same way with respect to hook execution or hook visibility

Why This Looks Like A Runtime / Hook-Coverage Problem

  • the plugin is not dead: real traces do arrive remotely for some completed sessions
  • the clean failing session completed multiple turns but had no sidecar and no remote session
  • host-visible Stop-hook completion evidence appears inconsistent across different session paths in this environment

Questions

  1. Are all completed Codex session paths expected to trigger the same Stop-hook lifecycle?
  2. Is incomplete sidecar coverage on completed turns a known issue?
  3. Is there a recommended way to distinguish:
    • hook never fired
    • hook fired without transcript path
    • hook fired but skipped sidecar mark
    • hook fired and exported partially

Current Workaround

We can reliably export the same real rollout by replaying it through the installed official plugin bundle ourselves. That works as an operational fallback, but it does not prove the host automatic Stop hook is healthy.

Impact

We currently have to treat automatic Codex session coverage as partial/unreliable on this runtime surface and use fallback replay for continuity when needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions