Before any other work in this repo, enable prek: cargo binstall prek && prek install. Hooks are defined in prek.toml.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Desktest is a CLI tool for automated end-to-end testing of desktop applications (Linux, macOS, and Windows) using LLM-powered agents. It spins up a Docker container (Linux), Tart VM (macOS), or QEMU/KVM VM (Windows) with a desktop environment, deploys an app, then runs an OSWorld-style agent loop where the LLM interacts with the app via PyAutoGUI code execution and observes via screenshots + accessibility trees.
Tech stack: Rust (edition 2024), Tokio async runtime, Docker (Bollard), Tart (Apple Virtualization.framework), QEMU/KVM (Windows VMs), multi-model LLM support (OpenAI, Anthropic, custom OpenAI-compatible endpoints).
cargo build # Build
cargo run -- validate examples/gedit-save.json # Validate task file
cargo run -- run task.json --config config.json # Run single test
cargo run -- run task.json --replay # Deterministic replay (no LLM)
cargo run -- run task.json --qa # Run with QA bug reporting
cargo run -- suite examples/ # Run test suite
cargo run -- interactive task.json # Interactive debugging
cargo run -- attach task.json --container ID # Attach to existing container
cargo run -- logs desktest_artifacts/ # View trajectory in terminal
cargo run -- logs desktest_artifacts/ --steps 3-7 # View specific step range
cargo test # All non-ignored tests
cargo test -- --ignored --test-threads=1 # Integration tests (require Docker)Main flow (src/main.rs): Parse CLI β load task JSON β create session (Docker container, Tart VM, QEMU/KVM VM, or native host) β wait for desktop β run setup steps β run agent loop (or skip for programmatic-only) β run evaluation β write results β collect artifacts β cleanup.
Attach flow (desktest attach): Parse CLI β load task JSON β attach to existing container (no create/cleanup) β run setup steps β run agent loop β run evaluation β write results β collect artifacts. Uses DockerSession::attach() instead of DockerSession::create().
Exit codes: 0=pass, 1=fail, 2=config error, 3=infra error, 4=agent error.
src/session/mod.rsdefines theSessiontrait (8 async methods) andSessionKindenum with five variants:Docker(DockerSession),Tart(TartSession),Native(NativeSession),WindowsVm(WindowsVmSession),WindowsNative(WindowsNativeSession)forward_session!macro generatesimpl Session for SessionKindby matching on variants β enum dispatch, not dynamic dispatch- Platform-specific operations accessed via
session.as_docker(),session.as_tart(),session.as_native(),session.as_windows_vm(),session.as_windows_native() src/session/native.rsβNativeSessionruns commands directly on the host macOS desktop (no VM, no isolation)src/session/windows_native.rsβWindowsNativeSessionscaffolding (stub impl, full implementation deferred to when a Windows host is available for testing)
- The agent loop lives in
agent/loop_v2.rs(AgentLoopV2) β the OSWorld-style PyAutoGUI code execution loop src/task.rsuses serde tagged enums (#[serde(tag = "type")]) forAppConfig(includingVncAttachfor attach mode,MacosTartfor Tart VMs,MacosNativefor host testing,WindowsVmfor QEMU/KVM VMs,WindowsNativefor host testing),MetricConfig, andSetupStepAppErrorvariants insrc/error.rsmap to specific exit codes (0β4) β don't change the mapping without updating docspub(crate) use orchestration::run_taskinmain.rsre-exports this forsuite.rsto use ascrate::run_tasksrc/observation.rsusesObservationConfig::for_session()to select platform-specific screenshot and a11y commands (Linux: scrot + pyatspi, macOS: screencapture + AXUIElement, Windows: PIL ImageGrab + uiautomation)
Built from debian:bookworm-slim with Xvfb, XFCE4, x11vnc, xdotool, scrot, ffmpeg, Python3, PyAutoGUI, pyatspi, AT-SPI2, FUSE, GTK3 libs. Runs as non-root user "tester". Entrypoint starts display server, dbus, AT-SPI registry, desktop, VNC, then writes sentinel file.
IMPORTANT: ~/.Xauthority must exist for the tester user. PyAutoGUI (via python-xlib) crashes with Xlib.error.XauthError without it. The base Dockerfile creates it, but custom images or images that switch users must ensure it exists. Custom images are validated at startup; built-in images have a fallback in execute-action.py.
Docker images:
desktest-desktop:latestβ Base image (Dockerfile)desktest-desktop:electronβ Extends base with Node.js 20 + Electron deps (Dockerfile.electron)
Helper scripts:
docker/get-a11y-tree.pyβ Extracts linearized accessibility tree via pyatspi (TSV format)docker/execute-action.pyβ Executes PyAutoGUI code from stdin, returns JSON resultdocker/screenshot_compare.pyβ PIL-based screenshot comparison for visual assertions
Default display resolution: 1920x1080.
Uses QEMU/KVM with a user-provided Windows 11 QCOW2 golden image. Copy-on-write overlays give each test a clean environment. VirtIO-FS shared directory (via WinFsp) provides hostβVM communication using the same file-based IPC protocol as Tart (src/vm_protocol.rs). Requires swtpm (software TPM 2.0) and OVMF (UEFI firmware).
Golden image preparation: desktest init-windows β two-stage process: Stage 1 installs Windows from ISO via Autounattend.xml, Stage 2 provisions via SSH (Python, PyAutoGUI, uiautomation, WinFsp, agent scripts).
Guest-side scripts:
windows/vm-agent.pyβ Guest agent polling shared directory for requests (runs via Task Scheduler at logon)windows/execute-action.pyβ PyAutoGUI executor for Windows (Win32 SendInput backend)windows/get-a11y-tree.pyβ UIA accessibility tree extraction viauiautomationpackagewindows/win-screenshot.pyβ Screenshot capture via PILImageGrab.grab()
Host-side modules:
src/windows/mod.rsβWindowsVmSession: QEMU lifecycle, Session trait impl viaProtocolClientsrc/windows/deploy.rsβ App deployment into Windows VMsrc/windows/readiness.rsβ Desktop/app readiness detection for Windowssrc/init_windows.rsβdesktest init-windowsgolden image provisioning
- Folder-size CI failure β spawn subagent
.claude/agents/folder-refactor-advisor.md.