Skip to content

Latest commit

 

History

History
122 lines (90 loc) · 4.48 KB

File metadata and controls

122 lines (90 loc) · 4.48 KB

UI Regression Testing (Tmux Harness)

This document describes the on-demand terminal UI integration suite used to validate real REPL rendering behavior in tmux.

Purpose

  1. Validate end-to-end UI behavior (startup banner, prompt, spinner, approvals, tool output).
  2. Catch regressions that unit tests cannot see (line redraw, status updates, terminal formatting).
  3. Preserve enough artifacts to make failures debuggable without rerunning immediately.

Current Suite

Integration test entrypoint:

  • tests/ui_tmux_regression.rs
  • tests/traceui_tmux_regression.rs

Harness utilities:

  • tests/ui_tmux/mod.rs

Current scenarios:

  1. Baseline shell approval/rendering flow:
    • Start buddy inside an isolated tmux pane through asciinema.
    • Run one prompt that produces a deterministic run_shell tool call via a fake model server.
    • Approve the command.
    • Verify spinner/liveness lines, approval formatting, command output, and final assistant reply.
    • Exit cleanly and assert expected mock request count.
  2. Managed tmux pane + targeted shell flow:
    • Run scripted tool calls that create a managed pane (tmux_create_pane) and then run run_shell targeted to that pane.
    • Approve both operations.
    • Verify targeted approval and output rendering.
    • Assert expected mock request count and clean shutdown.
  3. Shared-shell guardrail flow:
    • Fake model calls run_shell with set -e.
    • Verify command is blocked before execution with clear error text.
  4. Default-pane recovery flow:
    • Fake model sends tmux_send_keys with exit to kill shared shell.
    • Follow-up run_shell request should trigger shared-pane recovery.
    • Verify recovery notice and successful command execution.
  5. Missing-target suppression flow:
    • Fake model repeats the same missing tmux_send_keys target.
    • Verify repeated identical failures are suppressed with deterministic guidance.
  6. Traceui split-pane visibility flow:
    • Launch buddy traceui against a synthetic trace file with pathological long left-pane rows.
    • Verify the divider and right-pane detail remain visible.
  7. Traceui detail-scroll flow:
    • Launch buddy traceui against a trace file whose selected event detail exceeds the pane height.
    • Use raw keypresses to scroll and verify lower content appears.
  8. Traceui stream pause/resume flow:
    • Launch buddy traceui --stream, append new events, navigate away from follow mode, and verify paused buffering.
    • Resume with Esc and verify the newest event is selected again.

Runtime Dependencies

The suite requires these commands in PATH:

  1. tmux
  2. asciinema

If either is missing, the ignored test fails with an actionable prerequisite message.

Artifact Model

Each run writes under:

  • artifacts/ui-regression/<scenario>-<pid>-<timestamp>/

Artifacts include:

  1. session.cast:
    • full asciinema recording.
  2. pipe.log:
    • continuous tmux pipe-pane output stream.
  3. snapshots/*.txt:
    • checkpoint captures from tmux capture-pane (plain + ANSI).
  4. report.json:
    • structured assertion report with matched=true/false and artifact paths.

Artifacts are intentionally preserved for both pass and fail runs.

Tmux cleanup behavior:

  1. Harness always kills its own detached session on teardown.
  2. Harness also kills the buddy-managed tmux session derived from the scenario session name to prevent session leaks across runs.
  3. Regression scenarios explicitly assert that the derived buddy-managed session does not exist after teardown.

Commands

Opt-in direct cargo command:

cargo test --test ui_tmux_regression -- --ignored --nocapture
cargo test --test traceui_tmux_regression -- --ignored --nocapture

Makefile wrapper:

make test-ui-regression

Determinism Strategy

  1. REPL scenarios start a local scripted fake model HTTP server.
  2. The fake server returns:
    • tool-call response on request #1,
    • final assistant text on request #2.
  3. Responses include short delays to exercise spinner/liveness UI paths.
  4. Traceui scenarios instead write synthetic JSONL trace files directly and drive the viewer with raw tmux keypresses.
  5. Each test uses isolated HOME/work directories under its artifact root.

Extension Guidance

When adding scenarios:

  1. Keep each scenario deterministic and minimal.
  2. Add explicit expected substrings for each UI element being validated.
  3. Persist all relevant captures and update report.json schema only additively.
  4. Keep tests #[ignore] unless intentionally moving them into default CI coverage.