Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 34 additions & 8 deletions .github/agents/dev-loop.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,10 @@ any spike code deleted or retro-fitted.

**Invoke the `evidence-capture` skill.** This phase is a **hard gate**: it does not
exit until an AI review of a captured runtime artifact confirms the change visibly
matches its issue intent (or until 3 iterations escalate to the human).
matches its issue intent (or until 3 iterations escalate to the human). **The agent
must never silently skip this phase or collapse it away during the continuous
Phase 3-7 flow.** Displaying the result is a required output of every dev-loop run
(see step 6); the only way it is omitted is an explicit user opt-out.

1. **Identify the change type** from the table in
`.github/skills/evidence-capture/SKILL.md` (CLI, library, bug fix, refactor,
Expand Down Expand Up @@ -274,19 +277,37 @@ matches its issue intent (or until 3 iterations escalate to the human).
(typically `.evidence/<phase-id>/evidence.md`). **Defer the PR upload to
Phase 7** -- the same script runs again without `-LocalOnly` once the PR
exists. Record the local URL for the Task Complete Summary's
`Evidence (local)` field. Proceed to Phase 6.
`Evidence (local)` field. Proceed to step 6.
- Either fails -> append the diagnosis to `.evidence/<phase-id>/diagnosis.md`,
increment `.evidence/<phase-id>/iteration.txt`. If iteration <= 3, apply
the diagnosed fix (this is a fix-in-place within Phase 5b, *not* a route
back to Phase 3) and re-capture. If iteration == 3, write
`.evidence/<phase-id>/ESCALATED`, post the diagnosis as a PR comment, and
pause for the user.
6. **Display the result (mandatory).** The result must be visible to the user
in the agent's response without re-running the command:
- **Inline (CLI / PowerShell / command / markdown / text, and the refactor
"no behavior change" attestation):** render the actual result **inline** --
the command plus its real captured output, with ANSI escape sequences
stripped. `Publish-Evidence.ps1` echoes this for Inline artifacts; reflect
it in the final response.
- **ArtifactReference (UI HTML+video, large/binary captures):** a `file:///`
link (plus the PR link) is sufficient; no inline rendering required.

This step runs on every change and is **never an agent default to skip**. The
user may opt out with a natural-language request ("skip evidence display" /
"skip the output"), which maps to `Publish-Evidence.ps1 -SkipDisplay`: the
inline echo is suppressed but the `Evidence (local):` `file:///` link line
still prints, and the loop summary records that the display was skipped by
user request.

**Exit criteria:** `PASSED` marker exists in `.evidence/<phase-id>/`, the
clickable `file:///` URL for the entry-point file was printed in agent output,
Task Complete Summary will include both **Evidence (local)** and (after
Phase 7) **Evidence (PR)** fields. The PR upload itself is deferred to
Phase 7.
**the result was displayed** (inline ANSI-stripped output for CLI/markdown
artifacts, a `file:///` link for UI/binary artifacts) and was not skipped unless
the user explicitly opted out, and the Task Complete Summary will include both
**Evidence (local)** and (after Phase 7) **Evidence (PR)** fields. The PR upload
itself is deferred to Phase 7.
**-> On PASS, proceed to Phase 6. On ESCALATE, pause for user input. If a
structural fix is required that affects other tests, return to Phase 3.**

Expand Down Expand Up @@ -489,10 +510,15 @@ Once Phase 7 passes with zero unresolved threads and a successful dry run:

1. Run the full test suite one final time. Present the evidence.
2. Present the dry run results.
3. Summarize: branch name, what was implemented, what was refactored,
3. **Display the result** in the final summary on every run: for CLI/markdown
changes render the actual output inline (ANSI-stripped); for UI/binary
changes include the `file:///` link (plus PR link). Omit the inline display
only when the user explicitly opted out (`-SkipDisplay`), in which case note
that the display was skipped by user request.
4. Summarize: branch name, what was implemented, what was refactored,
functional tests added, loop iterations, dry run result, PR number,
linked issue number, Copilot review status.
4. **Execute Phase 8 (Merge)** -- rebase-merge the PR with
5. **Execute Phase 8 (Merge)** -- rebase-merge the PR with
``gh pr merge <pr-number> --rebase --delete-branch``. Never merge while CI
is red or with unresolved review threads.
5. Execute Phase 9 (Cleanup) commands with actual values (no placeholders).
6. Execute Phase 9 (Cleanup) commands with actual values (no placeholders).
10 changes: 9 additions & 1 deletion .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,14 +341,17 @@ and branch names -- never plain-text references like `#131`.

Every `task_complete` summary must include the following fields whenever the
underlying data exists. Omit a field only when it does not apply to the work
just performed (e.g., a Q&A turn with no PR).
just performed (e.g., a Q&A turn with no PR). **Exception:** on every dev-loop
run the **Result display** is mandatory and must not be omitted -- only the PR
link may be dropped when no PR exists.

| Field | Required format |
|---|---|
| **PR** | Full link: `[#NNN](https://github.com/<owner>/<repo>/pull/NNN)` |
| **Issue** | Full link: `[#NNN](https://github.com/<owner>/<repo>/issues/NNN)` |
| **Branch** | Linked code span: `` [`<branch-name>`](https://github.com/<owner>/<repo>/tree/<branch-name>) `` |
| **Command to test** | Exact shell command(s) the user can run locally to verify, fenced as a code block |
| **Result display** | The actual result, so the user sees the change worked without re-running it. **Required on every dev-loop run.** For CLI/markdown changes render the real captured output **inline** (ANSI-stripped, fenced); for UI/binary changes a `file:///` link is sufficient. Omit the inline output only when the user opted out (`-SkipDisplay`), and then note it was skipped by user request. |
| **Evidence (local)** | Clickable `file:///` URL to the entry-point file at `.evidence/<phase-id>/evidence.md` (printed by `Publish-Evidence.ps1`). Required when Phase 5b ran. |
| **Evidence (PR)** | Link to the PR comment containing the captured runtime artifact, or to the CI-artifact URL for files larger than 25 MB. Required when Phase 5b ran and the PR exists. |

Expand All @@ -371,6 +374,11 @@ Example:
- **PR**: [#57](https://github.com/owner/repo/pull/57) (merged)
- **Branch**: [`feat/42-user-auth`](https://github.com/owner/repo/tree/feat/42-user-auth)
- **Test**: `dotnet test --no-build`
- **Result**:
```text
> app auth --user alice
Authenticated alice (token expires in 3600s)
```
- **Evidence (local)**: file:///D:/Git/repo/.evidence/phase-5b-20260101T000000Z/evidence.md
- **Evidence (PR)**: https://github.com/owner/repo/pull/57#issuecomment-1234567
```
Expand Down
5 changes: 5 additions & 0 deletions .github/skills/dev-loop-phase-gate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ npx tsc && npx vitest run
- [ ] The entry-point file is `evidence.md` per the Local-link contract
- [ ] A clickable `file:///` URL to the entry-point file was printed in the
agent output (emitted by `Publish-Evidence.ps1`)
- [ ] The result was displayed in agent output -- **inline** (ANSI-stripped)
for CLI/markdown artifacts, a `file:///` link for UI/binary artifacts --
and the step was not skipped (unless the user explicitly requested
`-SkipDisplay`, in which case the link line still printed and the skip was
recorded)
- [ ] The plan's Evidence Plan section matches the produced artifact
(change type, artifact format, capture command, entry-point file)
- [ ] `.evidence/<phase-id>/iteration.txt` records the loop iteration count
Expand Down
29 changes: 29 additions & 0 deletions .github/skills/evidence-capture/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,30 @@ upload is for durability and team visibility, not the primary review path.
entirely and only prints the local link; the final non-`-LocalOnly` run during
Phase 7 posts the artifact to the PR.

## Result display contract

Displaying the result is **mandatory on every change** so the user can confirm
the change worked **without re-running the command themselves**. The form of the
display depends on the artifact classification:

| Classification | Artifact kinds | Display |
|---|---|---|
| **Inline** | `.md` / CLI / PowerShell / command / text (and the refactor "no behavior change" attestation) | The actual result is rendered **inline** in the agent's final response: the command plus its real captured output (ANSI escape sequences stripped so the fenced block is clean plain text). |
| **ArtifactReference** | UI HTML+video, large/binary captures | A `file:///` link (plus the PR link) is sufficient; **no inline rendering required**. |

`Publish-Evidence.ps1` performs the inline echo for Inline artifacts in both
`-LocalOnly` and normal modes, stripping ANSI in its echo path. ArtifactReference
artifacts stay link-only. The `file:///` link and PR link remain durable
secondary pointers in all cases.

**Opt-out.** The display defaults ON and must never be silently skipped by the
agent. The user may opt out with a natural-language request to the dev loop
("skip evidence display" / "skip the output"), which maps to the
`-SkipDisplay` switch on `Publish-Evidence.ps1`. When skipped, the inline echo
is suppressed but the `Evidence (local):` `file:///` link line still prints, and
the dev-loop summary records that the display was skipped by user request.
Skipping is a deliberate user choice, never an agent default.

## Storage and lifecycle

Evidence is **ephemeral by default**. Committed evidence rots as the app evolves
Expand Down Expand Up @@ -242,6 +266,11 @@ After invoking this skill, the following must all be true:
so the reviewer has a single anchor.
- [ ] A clickable `file:///` URL to the entry-point file was printed in the
agent's output (emitted by `Publish-Evidence.ps1`).
- [ ] The result was displayed: for an Inline (markdown/CLI/text) artifact the
actual result is rendered inline (ANSI-stripped) in the agent's response;
for an ArtifactReference (UI/binary) artifact a `file:///` link is
sufficient. The display was not skipped unless the user explicitly
requested `-SkipDisplay`.
- [ ] If `PASSED`, the artifact was uploaded to the PR (or earmarked for upload
when the PR is opened).
- [ ] The artifact was produced from a fresh run against the current HEAD SHA.
Expand Down
39 changes: 38 additions & 1 deletion .github/skills/evidence-capture/helpers/Publish-Evidence.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@
PR isn't spammed with intermediate captures; the final artifact is posted
during Phase 7 with this switch omitted.

.PARAMETER SkipDisplay
Suppress the inline result display. By default, for an Inline (markdown /
text) artifact the script echoes the artifact content (ANSI-stripped) to
output so the result is visible without opening the file. With this switch
the inline echo is suppressed, but the `Evidence (local):` `file://` link
line is still printed. This is a deliberate user opt-out (mapped from the
dev-loop natural-language opt-out "skip evidence display"), never an agent
default. ArtifactReference (binary / UI) artifacts are never echoed inline
regardless of this switch.

.OUTPUTS
[pscustomobject] with:
Mode -- 'Inline' | 'ArtifactReference' | 'LocalOnly'
Expand All @@ -72,7 +82,9 @@ param(

[scriptblock]$GhInvoker = { param([string[]]$GhArgs) & gh @GhArgs },

[switch]$LocalOnly
[switch]$LocalOnly,

[switch]$SkipDisplay
)

Set-StrictMode -Version Latest
Expand All @@ -93,6 +105,17 @@ function ConvertTo-FileUri {
return "file:///$forward"
}

function Remove-AnsiEscape {
<#
.SYNOPSIS
Strip ANSI/VT escape sequences (color, cursor, etc.) from a string so
the content renders as clean plain text inside a markdown fenced code
block (which does not interpret ANSI color).
#>
param([Parameter(Mandatory)][AllowEmptyString()][string]$Text)
return [regex]::Replace($Text, "\x1B\[[0-9;?]*[ -/]*[@-~]", '')
}

function Get-ArtifactMode {
param(
[string]$Path,
Expand Down Expand Up @@ -164,6 +187,20 @@ $comment = Get-CommentBody -Mode $classifiedMode -ResolvedPath $resolvedPath `
$fileUri = ConvertTo-FileUri -Path $resolvedPath
Write-Host "Evidence (local): $fileUri"

# Inline result display. For an Inline (markdown / text) artifact, echo the
# artifact content (ANSI-stripped) so the result is visible in the agent's
# output without opening the file. This runs in both -LocalOnly and normal
# modes. ArtifactReference (binary / UI) artifacts stay link-only. The
# -SkipDisplay switch suppresses this echo while keeping the link line above.
if ($classifiedMode -eq 'Inline' -and -not $SkipDisplay) {
$rawContent = Get-Content -LiteralPath $resolvedPath -Raw
if ($null -eq $rawContent) { $rawContent = '' }
$displayContent = Remove-AnsiEscape -Text $rawContent
Write-Host '----- Evidence (inline result) -----'
Write-Host $displayContent
Write-Host '----- end evidence -----'
}

if ($LocalOnly) {
return [pscustomobject]@{
Mode = 'LocalOnly'
Expand Down
73 changes: 73 additions & 0 deletions .github/skills/evidence-capture/tests/Publish-Evidence.Tests.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -219,4 +219,77 @@ Describe 'Publish-Evidence' {
($output | Out-String) | Should -Match 'file:///'
}
}

Context 'inline result display' {
It 'echoes the inline markdown content to output, not just the URL' {
$artifact = Join-Path $script:TempDir 'echo-content.md'
$content = "# Evidence`r`n`r`nThe widget now returns 42."
Set-Content -LiteralPath $artifact -Value $content -NoNewline

$stub = { param([string[]]$GhArgs) }

$output = & $script:ScriptPath -ArtifactPath $artifact -PullRequest 5 `
-GhInvoker $stub -LocalOnly -InformationAction Continue 6>&1

($output | Out-String) | Should -Match 'The widget now returns 42\.'
}

It 'echoes the inline content when posting to a PR (non-LocalOnly)' {
$artifact = Join-Path $script:TempDir 'echo-content-pr.md'
$content = "# Evidence`r`n`r`nResponse body equals OK-MARKER."
Set-Content -LiteralPath $artifact -Value $content -NoNewline

$stub = { param([string[]]$GhArgs) }

$output = & $script:ScriptPath -ArtifactPath $artifact -PullRequest 9 `
-GhInvoker $stub -InformationAction Continue 6>&1

($output | Out-String) | Should -Match 'OK-MARKER'
}

It 'strips ANSI escape sequences from the echoed inline content' {
$artifact = Join-Path $script:TempDir 'ansi.md'
$esc = [char]27
$content = "# Out`r`n`r`n${esc}[31mERROR${esc}[0m red text"
Set-Content -LiteralPath $artifact -Value $content -NoNewline

$stub = { param([string[]]$GhArgs) }

$output = & $script:ScriptPath -ArtifactPath $artifact -PullRequest 5 `
-GhInvoker $stub -LocalOnly -InformationAction Continue 6>&1

$joined = ($output | Out-String)
$joined | Should -Match 'ERROR red text'
$joined | Should -Not -Match ([regex]::Escape("$esc["))
}

It 'does not echo raw content for an ArtifactReference artifact' {
$artifact = Join-Path $script:TempDir 'ui-noecho.html'
Set-Content -LiteralPath $artifact -Value '<html>UNIQUEMARKERXYZ</html>' -NoNewline

$stub = { param([string[]]$GhArgs) }

$output = & $script:ScriptPath -ArtifactPath $artifact -PullRequest 5 `
-GhInvoker $stub -LocalOnly -InformationAction Continue 6>&1

$joined = ($output | Out-String)
$joined | Should -Not -Match 'UNIQUEMARKERXYZ'
$joined | Should -Match 'file:///'
}

It 'suppresses the inline echo under -SkipDisplay but still prints the file:/// link' {
$artifact = Join-Path $script:TempDir 'skip.md'
$content = "# Skip`r`n`r`nSECRETMARKER123 should not appear."
Set-Content -LiteralPath $artifact -Value $content -NoNewline

$stub = { param([string[]]$GhArgs) }

$output = & $script:ScriptPath -ArtifactPath $artifact -PullRequest 5 `
-GhInvoker $stub -LocalOnly -SkipDisplay -InformationAction Continue 6>&1

$joined = ($output | Out-String)
$joined | Should -Not -Match 'SECRETMARKER123'
$joined | Should -Match 'file:///'
}
}
}
9 changes: 8 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,13 +167,20 @@ See `.github/copilot-instructions.md` -> **Skills & Agents** for the complete re
## Task Complete Summaries

When calling `task_complete`, include the following fields whenever the data
exists (omit any that don't apply, e.g., a Q&A turn with no PR):
exists (omit any that don't apply, e.g., a Q&A turn with no PR). The **Result
display** is the exception: it is mandatory on every dev-loop run and must not be
omitted -- only the PR link may be dropped when no PR exists.

- **Issue** -- `[#NNN](https://github.com/<owner>/<repo>/issues/NNN)`
- **PR** -- `[#NNN](https://github.com/<owner>/<repo>/pull/NNN)`
- **Branch** -- `` [`<branch>`](https://github.com/<owner>/<repo>/tree/<branch>) ``
- **Test** -- exact local verification command (e.g., `dotnet test --no-build`,
`Invoke-Pester -Path .\...`)
- **Result display** -- the actual result so the user sees the change worked
without re-running it. **Required on every dev-loop run.** Inline (ANSI-stripped,
fenced) captured output for CLI/markdown changes; a `file:///` link for
UI/binary changes. Omit the inline output only when the user opted out
(`-SkipDisplay`), and note it was skipped by user request.
- **Evidence (local)** -- clickable `file:///` URL to
`.evidence/<phase-id>/evidence.md` (the entry-point file). Required when
Phase 5b ran.
Expand Down
Loading
Loading