Investigate Linux support for Codex Record & Replay demo-to-skill workflows

## Summary

OpenAI recently added **Record & Replay** to the Codex app. The feature lets a user demonstrate a workflow on macOS, then Codex turns that demonstration into a reusable **Skill**.

This issue tracks whether `codex-desktop-linux` can support the same primitive on Linux, either by directly implementing Linux-native recording or by supporting import/replay of generated skills where possible.

Upstream docs/source-of-truth:

- Record & Replay docs: https://developers.openai.com/codex/record-and-replay
- Skills docs: https://developers.openai.com/codex/skills
- Codex app docs: https://developers.openai.com/codex/app
- Codex changelog: https://developers.openai.com/codex/changelog

## Current upstream behavior

Based on the OpenAI Codex docs:

- Record & Replay is currently available on **macOS**.
- Initial availability excludes the EEA, UK, and Switzerland.
- Computer Use must be available and enabled.
- The feature records a demonstrated workflow and drafts a reusable Codex Skill.
- The generated artifact is conceptually a skill, not just a raw screen recording or coordinate-based macro.

The key point: **the capture surface is macOS-specific today, but the output primitive is a Codex Skill.** That makes this worth investigating for Linux even if full GUI recording is not immediately possible.

## Why this matters for Linux

`codex-desktop-linux` already exists because the upstream Codex desktop experience is not Linux-native. Record & Replay creates another platform gap:

| Capability | macOS | Windows | Linux today |
| --- | --- | --- | --- |
| Codex app | supported | supported | not officially native |
| Computer Use | supported | supported | not officially documented as supported |
| Record & Replay capture | supported | not currently documented | unsupported |
| Using reusable skills | should be possible depending on skill content | should be possible depending on skill content | should be possible for CLI/script/browser-oriented skills |

This issue should not be scoped as “copy macOS screen recording APIs.” The better framing is:

> Can Linux participate in the demo-to-skill workflow, either as a skill consumer, a partial skill replayer, or eventually a native skill authoring environment?

## Desired outcome

Design a Linux path for Record & Replay compatibility.

That likely means separating the problem into three layers:

1. **Skill compatibility**
   - Can `codex-desktop-linux` discover, load, edit, and invoke Codex Skills generated elsewhere?
   - Are skills just directories with `SKILL.md` plus optional scripts/assets/references?
   - Where should Linux store/import them?
   - Can we support CLI/script/browser-oriented skills before GUI replay?

2. **Replay / execution support**
   - Can a generated skill execute through existing Codex CLI, MCP tools, browser automation, or a Linux Computer Use shim?
   - What should happen when a skill requires macOS-only apps or APIs?
   - Can we expose a capability check so unsupported skills fail clearly instead of pretending they work?

3. **Linux-native recording / capture**
   - What Linux primitives could approximate the macOS recording path?
   - Possible candidates:
     - AT-SPI accessibility APIs
     - X11 event capture
     - Wayland portals / desktop portal APIs
     - compositor-specific protocols
     - browser automation traces
     - app-specific plugin/event streams
   - What can be captured safely and reliably?
   - What permissions model is required?

## Non-goals

This should **not** become a fragile coordinate macro recorder.

Avoid:

- blind mouse-coordinate replay as the main architecture
- pretending Linux GUI automation is equally available across Wayland, X11, GNOME, KDE, Sway, etc.
- hiding permission/security boundaries
- claiming support for arbitrary GUI workflows before the underlying backend exists

The win condition is compatibility with the Codex **Skill** primitive, not a brittle RPA clone.

## Proposed implementation phases

### Phase 1: Research and compatibility notes

Document the current upstream Record & Replay assumptions:

- where skills are stored
- how skills are described
- whether generated skills are portable across machines/OSes
- how Codex decides when to invoke a skill
- whether the Linux app can see/use skills created by Codex CLI or another desktop app

Deliverable:

- `docs/record-and-replay-linux.md`
- clear support matrix
- list of upstream blockers / unknowns

### Phase 2: Skill import and invocation

Before attempting Linux recording, support the obvious useful path:

- import or point to an existing Skill directory
- display available skills in the Linux desktop UI
- allow explicit invocation of a skill from a thread
- pass skill context to Codex in the same shape expected by upstream
- fail clearly if a skill requires unsupported GUI/computer-use capabilities

This makes Linux a **consumer** of Record & Replay output even before Linux can author those skills.

### Phase 3: Capability detection

Add a simple capability model for skills:

- `cli`: can run shell/script instructions
- `browser`: requires browser automation or page interaction
- `desktop-gui`: requires Computer Use / screen interaction
- `macos-only`: requires macOS app paths, AppleScript, Accessibility assumptions, etc.
- `windows-only`: requires Windows-specific UI/app paths
- `linux-gui`: future Linux GUI backend

The app should surface this honestly so agents/users know what can run.

### Phase 4: Linux capture spike

Only after skill consumption works, spike Linux-native capture.

Research questions:

- Can AT-SPI provide enough semantic UI events for GNOME apps?
- Can browser workflows be recorded more cleanly through browser automation traces instead of global desktop capture?
- Does Wayland require compositor-specific support, portals, or user-approved capture sessions?
- Should this project define a small recorder interface with pluggable backends?

Possible shape:

```text
recording backend
  -> normalized event timeline
  -> observations/screenshots/accessibility tree snapshots
  -> Codex skill-drafting prompt/context
  -> generated Skill directory
```

## Suggested architecture

The clean abstraction is a Linux `RecorderBackend` rather than one monolithic implementation:

```text
Codex Desktop Linux
  Skills registry
  Skill editor/viewer
  Skill invocation path
  Capability checker
  RecorderBackend trait/interface
    - no-op/import-only backend
    - browser trace backend
    - X11 backend
    - AT-SPI backend
    - future Wayland portal backend
```

This lets the project ship value early without blocking on the hardest Linux desktop automation problem.

## Acceptance criteria

A first useful version of this issue is done when:

- [ ] The repo documents current upstream Record & Replay behavior and Linux gaps.
- [ ] The app can discover or import a Codex Skill directory.
- [ ] The app can show available skills to the user.
- [ ] The app can explicitly invoke a skill when it only requires CLI/script/browser-safe capabilities.
- [ ] The app detects and clearly explains unsupported GUI/macOS-only skill requirements.
- [ ] A follow-up design exists for Linux-native recording backends.

## Open questions

- Does upstream expose enough of the generated Skill format to support Linux import cleanly?
- Are Record & Replay skills stored in the same registry as manually-authored Codex Skills?
- Can Codex CLI invoke the same skills without the desktop app?
- How much of Record & Replay skill drafting happens locally versus in upstream Codex services?
- What is the safest Linux permission model for recording desktop workflows?

## Bottom line

Record & Replay should be treated as a **demo-to-skill compiler**, not a screen macro feature.

For Linux, the right first move is probably:

1. support Skill discovery/import/invocation,
2. classify unsupported GUI/platform requirements honestly,
3. then research Linux-native recording backends.

That gives `codex-desktop-linux` compatibility with the important primitive now, while leaving room for native Linux workflow recording later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Linux support for Codex Record & Replay demo-to-skill workflows #536

Summary

Current upstream behavior

Why this matters for Linux

Desired outcome

Non-goals

Proposed implementation phases

Phase 1: Research and compatibility notes

Phase 2: Skill import and invocation

Phase 3: Capability detection

Phase 4: Linux capture spike

Suggested architecture

Acceptance criteria

Open questions

Bottom line

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Capability	macOS	Windows	Linux today
Codex app	supported	supported	not officially native
Computer Use	supported	supported	not officially documented as supported
Record & Replay capture	supported	not currently documented	unsupported
Using reusable skills	should be possible depending on skill content	should be possible depending on skill content	should be possible for CLI/script/browser-oriented skills

Investigate Linux support for Codex Record & Replay demo-to-skill workflows #536

Description

Summary

Current upstream behavior

Why this matters for Linux

Desired outcome

Non-goals

Proposed implementation phases

Phase 1: Research and compatibility notes

Phase 2: Skill import and invocation

Phase 3: Capability detection

Phase 4: Linux capture spike

Suggested architecture

Acceptance criteria

Open questions

Bottom line

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions