Summary
OpenAI recently added Record & Replay to the Codex app. The feature lets a user demonstrate a workflow on macOS, then Codex turns that demonstration into a reusable Skill.
This issue tracks whether codex-desktop-linux can support the same primitive on Linux, either by directly implementing Linux-native recording or by supporting import/replay of generated skills where possible.
Upstream docs/source-of-truth:
Current upstream behavior
Based on the OpenAI Codex docs:
- Record & Replay is currently available on macOS.
- Initial availability excludes the EEA, UK, and Switzerland.
- Computer Use must be available and enabled.
- The feature records a demonstrated workflow and drafts a reusable Codex Skill.
- The generated artifact is conceptually a skill, not just a raw screen recording or coordinate-based macro.
The key point: the capture surface is macOS-specific today, but the output primitive is a Codex Skill. That makes this worth investigating for Linux even if full GUI recording is not immediately possible.
Why this matters for Linux
codex-desktop-linux already exists because the upstream Codex desktop experience is not Linux-native. Record & Replay creates another platform gap:
| Capability |
macOS |
Windows |
Linux today |
| Codex app |
supported |
supported |
not officially native |
| Computer Use |
supported |
supported |
not officially documented as supported |
| Record & Replay capture |
supported |
not currently documented |
unsupported |
| Using reusable skills |
should be possible depending on skill content |
should be possible depending on skill content |
should be possible for CLI/script/browser-oriented skills |
This issue should not be scoped as “copy macOS screen recording APIs.” The better framing is:
Can Linux participate in the demo-to-skill workflow, either as a skill consumer, a partial skill replayer, or eventually a native skill authoring environment?
Desired outcome
Design a Linux path for Record & Replay compatibility.
That likely means separating the problem into three layers:
-
Skill compatibility
- Can
codex-desktop-linux discover, load, edit, and invoke Codex Skills generated elsewhere?
- Are skills just directories with
SKILL.md plus optional scripts/assets/references?
- Where should Linux store/import them?
- Can we support CLI/script/browser-oriented skills before GUI replay?
-
Replay / execution support
- Can a generated skill execute through existing Codex CLI, MCP tools, browser automation, or a Linux Computer Use shim?
- What should happen when a skill requires macOS-only apps or APIs?
- Can we expose a capability check so unsupported skills fail clearly instead of pretending they work?
-
Linux-native recording / capture
- What Linux primitives could approximate the macOS recording path?
- Possible candidates:
- AT-SPI accessibility APIs
- X11 event capture
- Wayland portals / desktop portal APIs
- compositor-specific protocols
- browser automation traces
- app-specific plugin/event streams
- What can be captured safely and reliably?
- What permissions model is required?
Non-goals
This should not become a fragile coordinate macro recorder.
Avoid:
- blind mouse-coordinate replay as the main architecture
- pretending Linux GUI automation is equally available across Wayland, X11, GNOME, KDE, Sway, etc.
- hiding permission/security boundaries
- claiming support for arbitrary GUI workflows before the underlying backend exists
The win condition is compatibility with the Codex Skill primitive, not a brittle RPA clone.
Proposed implementation phases
Phase 1: Research and compatibility notes
Document the current upstream Record & Replay assumptions:
- where skills are stored
- how skills are described
- whether generated skills are portable across machines/OSes
- how Codex decides when to invoke a skill
- whether the Linux app can see/use skills created by Codex CLI or another desktop app
Deliverable:
docs/record-and-replay-linux.md
- clear support matrix
- list of upstream blockers / unknowns
Phase 2: Skill import and invocation
Before attempting Linux recording, support the obvious useful path:
- import or point to an existing Skill directory
- display available skills in the Linux desktop UI
- allow explicit invocation of a skill from a thread
- pass skill context to Codex in the same shape expected by upstream
- fail clearly if a skill requires unsupported GUI/computer-use capabilities
This makes Linux a consumer of Record & Replay output even before Linux can author those skills.
Phase 3: Capability detection
Add a simple capability model for skills:
cli: can run shell/script instructions
browser: requires browser automation or page interaction
desktop-gui: requires Computer Use / screen interaction
macos-only: requires macOS app paths, AppleScript, Accessibility assumptions, etc.
windows-only: requires Windows-specific UI/app paths
linux-gui: future Linux GUI backend
The app should surface this honestly so agents/users know what can run.
Phase 4: Linux capture spike
Only after skill consumption works, spike Linux-native capture.
Research questions:
- Can AT-SPI provide enough semantic UI events for GNOME apps?
- Can browser workflows be recorded more cleanly through browser automation traces instead of global desktop capture?
- Does Wayland require compositor-specific support, portals, or user-approved capture sessions?
- Should this project define a small recorder interface with pluggable backends?
Possible shape:
recording backend
-> normalized event timeline
-> observations/screenshots/accessibility tree snapshots
-> Codex skill-drafting prompt/context
-> generated Skill directory
Suggested architecture
The clean abstraction is a Linux RecorderBackend rather than one monolithic implementation:
Codex Desktop Linux
Skills registry
Skill editor/viewer
Skill invocation path
Capability checker
RecorderBackend trait/interface
- no-op/import-only backend
- browser trace backend
- X11 backend
- AT-SPI backend
- future Wayland portal backend
This lets the project ship value early without blocking on the hardest Linux desktop automation problem.
Acceptance criteria
A first useful version of this issue is done when:
Open questions
- Does upstream expose enough of the generated Skill format to support Linux import cleanly?
- Are Record & Replay skills stored in the same registry as manually-authored Codex Skills?
- Can Codex CLI invoke the same skills without the desktop app?
- How much of Record & Replay skill drafting happens locally versus in upstream Codex services?
- What is the safest Linux permission model for recording desktop workflows?
Bottom line
Record & Replay should be treated as a demo-to-skill compiler, not a screen macro feature.
For Linux, the right first move is probably:
- support Skill discovery/import/invocation,
- classify unsupported GUI/platform requirements honestly,
- then research Linux-native recording backends.
That gives codex-desktop-linux compatibility with the important primitive now, while leaving room for native Linux workflow recording later.
Summary
OpenAI recently added Record & Replay to the Codex app. The feature lets a user demonstrate a workflow on macOS, then Codex turns that demonstration into a reusable Skill.
This issue tracks whether
codex-desktop-linuxcan support the same primitive on Linux, either by directly implementing Linux-native recording or by supporting import/replay of generated skills where possible.Upstream docs/source-of-truth:
Current upstream behavior
Based on the OpenAI Codex docs:
The key point: the capture surface is macOS-specific today, but the output primitive is a Codex Skill. That makes this worth investigating for Linux even if full GUI recording is not immediately possible.
Why this matters for Linux
codex-desktop-linuxalready exists because the upstream Codex desktop experience is not Linux-native. Record & Replay creates another platform gap:This issue should not be scoped as “copy macOS screen recording APIs.” The better framing is:
Desired outcome
Design a Linux path for Record & Replay compatibility.
That likely means separating the problem into three layers:
Skill compatibility
codex-desktop-linuxdiscover, load, edit, and invoke Codex Skills generated elsewhere?SKILL.mdplus optional scripts/assets/references?Replay / execution support
Linux-native recording / capture
Non-goals
This should not become a fragile coordinate macro recorder.
Avoid:
The win condition is compatibility with the Codex Skill primitive, not a brittle RPA clone.
Proposed implementation phases
Phase 1: Research and compatibility notes
Document the current upstream Record & Replay assumptions:
Deliverable:
docs/record-and-replay-linux.mdPhase 2: Skill import and invocation
Before attempting Linux recording, support the obvious useful path:
This makes Linux a consumer of Record & Replay output even before Linux can author those skills.
Phase 3: Capability detection
Add a simple capability model for skills:
cli: can run shell/script instructionsbrowser: requires browser automation or page interactiondesktop-gui: requires Computer Use / screen interactionmacos-only: requires macOS app paths, AppleScript, Accessibility assumptions, etc.windows-only: requires Windows-specific UI/app pathslinux-gui: future Linux GUI backendThe app should surface this honestly so agents/users know what can run.
Phase 4: Linux capture spike
Only after skill consumption works, spike Linux-native capture.
Research questions:
Possible shape:
Suggested architecture
The clean abstraction is a Linux
RecorderBackendrather than one monolithic implementation:This lets the project ship value early without blocking on the hardest Linux desktop automation problem.
Acceptance criteria
A first useful version of this issue is done when:
Open questions
Bottom line
Record & Replay should be treated as a demo-to-skill compiler, not a screen macro feature.
For Linux, the right first move is probably:
That gives
codex-desktop-linuxcompatibility with the important primitive now, while leaving room for native Linux workflow recording later.