You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add QA Evidence Bundle and detailed session-specific reporting for Chrome-supervised QA sessions. Each run should preserve enough value-safe context for the maintainer to review results and decide whether to create GitHub issues or start TDD work.
This slice should make failures useful: screenshots, logs, redacted diffs, reproduction notes, grouped findings, and a detailed run report that is never overwritten by later sessions.
Acceptance criteria
Each QA run creates an isolated evidence folder with a unique run id and timestamp.
Each session writes a detailed report named like qa-report--.md instead of overwriting a shared report.
Report includes run id, time, app commit/branch/dirty status, scenario list, per-scenario pass/fail/blocked, evidence paths, grouped findings, likely product bugs, setup/scenario problems, manual checks still needed, and recommended next issues.
The agent can record final QA verdicts and TDD-ready failure notes: repro, expected vs actual, likely cause, relevant logs/evidence, and likely product area.
Automatic start/end/error screenshots are captured for each scenario, with support for agent-added screenshots on surprising UI states.
Evidence includes frontend console errors, backend/action logs, redacted env/file diffs, changed file paths, and run workspace paths when safe.
Evidence is value-safe: no raw secrets, no full source dumps by default, and no sensitive credential values in screenshots or logs.
GitHub issues are not auto-created for every failure; the report groups related failures so maintainer can decide.
What to build
Add QA Evidence Bundle and detailed session-specific reporting for Chrome-supervised QA sessions. Each run should preserve enough value-safe context for the maintainer to review results and decide whether to create GitHub issues or start TDD work.
This slice should make failures useful: screenshots, logs, redacted diffs, reproduction notes, grouped findings, and a detailed run report that is never overwritten by later sessions.
Acceptance criteria
Blocked by