Skip to content

Investigate Linux support for Codex Record & Replay demo-to-skill workflows #536

Description

@joshyorko

Summary

OpenAI recently added Record & Replay to the Codex app. The feature lets a user demonstrate a workflow on macOS, then Codex turns that demonstration into a reusable Skill.

This issue tracks whether codex-desktop-linux can support the same primitive on Linux, either by directly implementing Linux-native recording or by supporting import/replay of generated skills where possible.

Upstream docs/source-of-truth:

Current upstream behavior

Based on the OpenAI Codex docs:

  • Record & Replay is currently available on macOS.
  • Initial availability excludes the EEA, UK, and Switzerland.
  • Computer Use must be available and enabled.
  • The feature records a demonstrated workflow and drafts a reusable Codex Skill.
  • The generated artifact is conceptually a skill, not just a raw screen recording or coordinate-based macro.

The key point: the capture surface is macOS-specific today, but the output primitive is a Codex Skill. That makes this worth investigating for Linux even if full GUI recording is not immediately possible.

Why this matters for Linux

codex-desktop-linux already exists because the upstream Codex desktop experience is not Linux-native. Record & Replay creates another platform gap:

Capability macOS Windows Linux today
Codex app supported supported not officially native
Computer Use supported supported not officially documented as supported
Record & Replay capture supported not currently documented unsupported
Using reusable skills should be possible depending on skill content should be possible depending on skill content should be possible for CLI/script/browser-oriented skills

This issue should not be scoped as “copy macOS screen recording APIs.” The better framing is:

Can Linux participate in the demo-to-skill workflow, either as a skill consumer, a partial skill replayer, or eventually a native skill authoring environment?

Desired outcome

Design a Linux path for Record & Replay compatibility.

That likely means separating the problem into three layers:

  1. Skill compatibility

    • Can codex-desktop-linux discover, load, edit, and invoke Codex Skills generated elsewhere?
    • Are skills just directories with SKILL.md plus optional scripts/assets/references?
    • Where should Linux store/import them?
    • Can we support CLI/script/browser-oriented skills before GUI replay?
  2. Replay / execution support

    • Can a generated skill execute through existing Codex CLI, MCP tools, browser automation, or a Linux Computer Use shim?
    • What should happen when a skill requires macOS-only apps or APIs?
    • Can we expose a capability check so unsupported skills fail clearly instead of pretending they work?
  3. Linux-native recording / capture

    • What Linux primitives could approximate the macOS recording path?
    • Possible candidates:
      • AT-SPI accessibility APIs
      • X11 event capture
      • Wayland portals / desktop portal APIs
      • compositor-specific protocols
      • browser automation traces
      • app-specific plugin/event streams
    • What can be captured safely and reliably?
    • What permissions model is required?

Non-goals

This should not become a fragile coordinate macro recorder.

Avoid:

  • blind mouse-coordinate replay as the main architecture
  • pretending Linux GUI automation is equally available across Wayland, X11, GNOME, KDE, Sway, etc.
  • hiding permission/security boundaries
  • claiming support for arbitrary GUI workflows before the underlying backend exists

The win condition is compatibility with the Codex Skill primitive, not a brittle RPA clone.

Proposed implementation phases

Phase 1: Research and compatibility notes

Document the current upstream Record & Replay assumptions:

  • where skills are stored
  • how skills are described
  • whether generated skills are portable across machines/OSes
  • how Codex decides when to invoke a skill
  • whether the Linux app can see/use skills created by Codex CLI or another desktop app

Deliverable:

  • docs/record-and-replay-linux.md
  • clear support matrix
  • list of upstream blockers / unknowns

Phase 2: Skill import and invocation

Before attempting Linux recording, support the obvious useful path:

  • import or point to an existing Skill directory
  • display available skills in the Linux desktop UI
  • allow explicit invocation of a skill from a thread
  • pass skill context to Codex in the same shape expected by upstream
  • fail clearly if a skill requires unsupported GUI/computer-use capabilities

This makes Linux a consumer of Record & Replay output even before Linux can author those skills.

Phase 3: Capability detection

Add a simple capability model for skills:

  • cli: can run shell/script instructions
  • browser: requires browser automation or page interaction
  • desktop-gui: requires Computer Use / screen interaction
  • macos-only: requires macOS app paths, AppleScript, Accessibility assumptions, etc.
  • windows-only: requires Windows-specific UI/app paths
  • linux-gui: future Linux GUI backend

The app should surface this honestly so agents/users know what can run.

Phase 4: Linux capture spike

Only after skill consumption works, spike Linux-native capture.

Research questions:

  • Can AT-SPI provide enough semantic UI events for GNOME apps?
  • Can browser workflows be recorded more cleanly through browser automation traces instead of global desktop capture?
  • Does Wayland require compositor-specific support, portals, or user-approved capture sessions?
  • Should this project define a small recorder interface with pluggable backends?

Possible shape:

recording backend
  -> normalized event timeline
  -> observations/screenshots/accessibility tree snapshots
  -> Codex skill-drafting prompt/context
  -> generated Skill directory

Suggested architecture

The clean abstraction is a Linux RecorderBackend rather than one monolithic implementation:

Codex Desktop Linux
  Skills registry
  Skill editor/viewer
  Skill invocation path
  Capability checker
  RecorderBackend trait/interface
    - no-op/import-only backend
    - browser trace backend
    - X11 backend
    - AT-SPI backend
    - future Wayland portal backend

This lets the project ship value early without blocking on the hardest Linux desktop automation problem.

Acceptance criteria

A first useful version of this issue is done when:

  • The repo documents current upstream Record & Replay behavior and Linux gaps.
  • The app can discover or import a Codex Skill directory.
  • The app can show available skills to the user.
  • The app can explicitly invoke a skill when it only requires CLI/script/browser-safe capabilities.
  • The app detects and clearly explains unsupported GUI/macOS-only skill requirements.
  • A follow-up design exists for Linux-native recording backends.

Open questions

  • Does upstream expose enough of the generated Skill format to support Linux import cleanly?
  • Are Record & Replay skills stored in the same registry as manually-authored Codex Skills?
  • Can Codex CLI invoke the same skills without the desktop app?
  • How much of Record & Replay skill drafting happens locally versus in upstream Codex services?
  • What is the safest Linux permission model for recording desktop workflows?

Bottom line

Record & Replay should be treated as a demo-to-skill compiler, not a screen macro feature.

For Linux, the right first move is probably:

  1. support Skill discovery/import/invocation,
  2. classify unsupported GUI/platform requirements honestly,
  3. then research Linux-native recording backends.

That gives codex-desktop-linux compatibility with the important primitive now, while leaving room for native Linux workflow recording later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions