Skip to content

test(app): add W1 renderer contract smoke specs (#598)#1125

Merged
Astro-Han merged 4 commits into
devfrom
claude/fix-598-w1-e2e
Jun 3, 2026
Merged

test(app): add W1 renderer contract smoke specs (#598)#1125
Astro-Han merged 4 commits into
devfrom
claude/fix-598-w1-e2e

Conversation

@Astro-Han
Copy link
Copy Markdown
Owner

@Astro-Han Astro-Han commented Jun 3, 2026

Summary

Strengthen W1 regression coverage (issue #598) so CI gates more of the session-view rewrite before the manual handtest. Adds one new spec file with two @smoke E2E tests in packages/app/e2e/session/session-w1-contracts.spec.ts.

Test 1 — rendered turn locks contracts. Seeds one turn (assistant.tool("bash") + assistant.reason(..., { text })) so groupParts yields a multi-part tool + reasoning trow with a <details> chevron, then asserts:

  • chevron geometry (12px + 12px icon) and collapsed-right (rotate(-90deg)) / open-down (rotate(0deg)) orientation (DESIGN.md L412/L468);
  • user-select: text on user-message-text, agent prose (text-part), and agent reasoning (reasoning-body);
  • trow-result-body resolves the --font-family-mono / --font-size-mono-small / --fg-weak tokens with no sans leakage (asserted against resolved tokens via probe nodes, so it tracks the token contract, not literal values);
  • the raw tool output is a bounded [data-scrollable] container (max-height: 240px, overflow-y: auto) — the DOM precondition the nested-wheel isolation keys off;
  • the thinking indicator is gone once the turn has visible parts.

Test 2 — thinking indicator positive case. A hung reply (assistant.hang(), submitted by hand to avoid project.prompt()'s idle wait) keeps the turn working with nothing visible; asserts [data-slot="session-turn-thinking"] + its shimmer are shown, then drains the LLM queue so teardown stays clean.

Scope decisions vs #598

  • Scroll behaviours A.1/A.2 (reading_history across content/dock resize; weak-trackpad demote) and the full nested-wheel isolation A.3 stay covered by the scroll-controller / scroll-intents unit tests plus the existing weak-wheel E2E in session-renderer-diagnostics.spec.ts. This spec only locks the A.3 DOM precondition, not a full wheel-isolation E2E (high flakiness, marginal added protection over the unit coverage).
  • Scope B.1 (user-bubble hairline) is dropped. The bubble moved to a fill-only treatment (--surface-interactive-base light / --surface-raised dark, 14px radius, no border or inset box-shadow), so the W1 --border-weak/--border-weaker hairline contract is stale — there is nothing to assert against.
  • Scope C (checklist segmentation) and D (tool-failure command guidance) target docs/, which is untracked in this checkout and cannot ship in a PR; left out of code scope.

Verification

  • bun script/e2e-local.ts -- e2e/session/session-w1-contracts.spec.ts --repeat-each=3 → 6 passed (~7s), no flakes.
  • App typecheck (tsgo -b) clean; new spec type-clean under e2e/tsconfig.json build mode.
  • Visual surface: the assertions read computed styles from the real rendered session timeline (chevron, trow body, user/agent bubbles) driven through the LLM fixture — the changed surface is exercised by the E2E itself.

Plan was Codex-consulted (E2E-first, seeding order, gap4 drop, gap6 scope) before implementation.

Refs #598

Summary by CodeRabbit

Tests

  • Added E2E session contract tests for turn rendering and display behavior.
  • Tests validate UI interactions (expand/collapse), design token compliance, text selectability, scrollable containers, and thinking indicator visibility during processing.

Strengthen W1 regression coverage so CI gates more of the session-view
rewrite before the manual handtest. Adds two @smoke specs:

- Rendered turn locks the trow chevron geometry (12px) and collapsed-right
  / open-down orientation, user-message / agent-prose / agent-reasoning text
  selectability, the trow-result-body mono-small + fg-weak typography tokens
  (asserted against resolved tokens, no sans leakage), and the bounded
  [data-scrollable] raw-output container that the nested-wheel isolation keys
  off. Seeds one tool+reasoning turn so groupParts yields a multi-part trow.
- Thinking indicator shows while the turn is working with nothing visible,
  and is gone once any part renders.

Scope notes vs issue #598:
- Scroll behaviours A.1/A.2 and the full nested-wheel isolation A.3 stay
  unit-covered (scroll-controller + scroll-intents tests) plus the existing
  weak-wheel E2E; only the DOM precondition is locked here.
- Scope B.1 user-bubble hairline is dropped: the bubble moved to a fill-only
  treatment (surface-interactive-base / surface-raised, no border/box-shadow),
  so the W1 hairline contract is stale.

Verified locally: bun script/e2e-local.ts -- e2e/session/session-w1-contracts.spec.ts
--repeat-each=3 (6 passed); app typecheck clean.

Refs #598
@Astro-Han Astro-Han added P2 Medium priority app Application behavior and product flows ui Design system and user interface task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work tech-debt Supplemental cleanup, maintainability, architecture, test, or quality debt context labels Jun 3, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 55 minutes and 43 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 057a577e-a265-42cc-a03b-52993476b5d2

📥 Commits

Reviewing files that changed from the base of the PR and between e6f1695 and e83735c.

📒 Files selected for processing (2)
  • packages/app/e2e/session/session-w1-contracts.spec.ts
  • packages/opencode/test/config/e2e-smoke-tagging.test.ts
📝 Walkthrough

Walkthrough

This PR adds a new Playwright E2E test spec (session-w1-contracts.spec.ts) that defines smoke tests for W1 rendered-turn UI contracts. It covers trow chevron rotation, design tokens, text selectability, nested scrollable containers, and thinking indicator visibility during async work states.

Changes

W1 Session Contract Tests

Layer / File(s) Summary
Test selectors and helpers
packages/app/e2e/session/session-w1-contracts.spec.ts (lines 1–34)
Defines CSS/data attribute selectors targeting trow blocks, chevron icon, reasoning body, raw tool output, and user/agent text. Provides utilities to extract chevron rotation from CSS transform and to detect <details> open state.
W1 rendered-turn and thinking indicator contracts
packages/app/e2e/session/session-w1-contracts.spec.ts (lines 40–199)
Two smoke tests: (1) locks trow chevron sizing, rotation behavior across collapsed/expanded states, verifies text selectability, validates mono-small + fg-weak design tokens via computed-style probing, asserts raw tool output scroll bounds and overflow behavior, and confirms thinking indicator is removed after visible parts mount; (2) ensures thinking indicator with shimmer appears while turn is hung with no visible content, using manual submission and polling for async queue drain.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

A rabbit hops through test assertions bright,
Chevrons spin and details open just right,
Design tokens dance in mono-small delight,
Thinking shimmer glows when work's in flight,
🐰 ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test(app): add W1 renderer contract smoke specs (#598)' clearly summarizes the main change: adding E2E smoke tests for W1 renderer contracts, following conventional commits format.
Description check ✅ Passed The PR description covers most required sections: Summary (detailed change description), Why (issue reference and regression coverage motivation), Related Issue (references #598), verification steps with results, and risk/scope rationale. However, the description lacks explicit Human Review Status and Review Focus sections from the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-598-w1-e2e

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Playwright E2E test suite (session-w1-contracts.spec.ts) to verify W1 renderer contracts, such as chevron behavior, text selectability, typography, and scrollable raw tool output. Feedback on the changes suggests using expect.poll when retrieving the count of inner triggers to prevent potential test flakiness, as Playwright's count() does not auto-wait for elements to be rendered.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/app/e2e/session/session-w1-contracts.spec.ts Outdated
@github-actions github-actions Bot removed the ui Design system and user interface label Jun 3, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P3 (only low-risk paths changed (packages/app/e2e/session/session-w1-contracts.spec.ts)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

The thinking-indicator spec submits with a raw Enter, bypassing
project.prompt(), so the session the UI creates was never registered.
With an assistant.hang() reply, teardown dropped only the project dir
and could leave a non-idle session/stream alive for the rest of the
Playwright worker, leaking backend state into later smoke tests.

Capture the session id from the URL once the turn is working and
project.trackSession() it so cleanup tears the stream down.

Addresses Codex review [P2] on #1125.
@Astro-Han Astro-Han added the ui Design system and user interface label Jun 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/app/e2e/session/session-w1-contracts.spec.ts (1)

186-187: 💤 Low value

Redundant clear before fill. prompt.fill(text) already replaces existing content, so the preceding prompt.fill("") is unnecessary unless it guards a specific composer quirk.

♻️ Optional simplification
   await prompt.click()
-  await prompt.fill("")
   await prompt.fill(text)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/app/e2e/session/session-w1-contracts.spec.ts` around lines 186 -
187, The test contains a redundant clear call: remove the unnecessary
prompt.fill("") before prompt.fill(text) because prompt.fill(text) replaces
existing content; update the code by deleting the prompt.fill("") invocation and
keep the single prompt.fill(text) call (references: prompt.fill("") and
prompt.fill(text) in the spec).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/app/e2e/session/session-w1-contracts.spec.ts`:
- Around line 182-199: The manual Enter-submit flow in the W1 thinking indicator
test creates a session but never registers it for fixture cleanup, so the
session is left behind after teardown. After the submit in the session spec,
once the URL contains a session id, call project.trackSession with
sessionIDFromUrl(page.url()) and project.directory so the new session is added
to the cleanup set; use the existing manual-submit tracking pattern from other
tests and keep the change near the prompt.click()/fill()/Enter sequence.

---

Nitpick comments:
In `@packages/app/e2e/session/session-w1-contracts.spec.ts`:
- Around line 186-187: The test contains a redundant clear call: remove the
unnecessary prompt.fill("") before prompt.fill(text) because prompt.fill(text)
replaces existing content; update the code by deleting the prompt.fill("")
invocation and keep the single prompt.fill(text) call (references:
prompt.fill("") and prompt.fill(text) in the spec).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e909f023-39f5-4b73-a2d6-f46f541555b3

📥 Commits

Reviewing files that changed from the base of the PR and between 2a54f92 and e6f1695.

📒 Files selected for processing (1)
  • packages/app/e2e/session/session-w1-contracts.spec.ts

Comment thread packages/app/e2e/session/session-w1-contracts.spec.ts
@github-actions github-actions Bot removed the ui Design system and user interface label Jun 3, 2026
)

The e2e-smoke-tagging guard pins the exact set of @smoke-tagged app E2E
tests; the two new session-w1-contracts specs must be listed or the
guard fails (seen on unit-windows-opencode-config-project). Add both in
sorted order.

Verified: bun test packages/opencode/test/config/e2e-smoke-tagging.test.ts
(2 pass).
@github-actions github-actions Bot added the harness Model harness, prompts, tool descriptions, and session mechanics label Jun 3, 2026
count() does not auto-wait, so reading it immediately after opening the
outer <details> could see 0/1 before the inner rows mount. Poll until
the count settles above 1 before clicking the rows.

Addresses gemini-code-assist review on #1125.
@Astro-Han Astro-Han merged commit d7928f8 into dev Jun 3, 2026
33 checks passed
@Astro-Han Astro-Han deleted the claude/fix-598-w1-e2e branch June 3, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Application behavior and product flows harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority task Narrow execution, audit, spike, migration, tracking, or upstream follow-up work tech-debt Supplemental cleanup, maintainability, architecture, test, or quality debt context

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant