feat: built-in PR context precomputation for PR-triggered agents
Problem
When writing a PR-triggered agent (on.pr), the agent itself almost always wants to know what changed: the changed file list, a unified diff, the base/head SHAs, and per-file pre/post snapshots. Today every author of a PR-reviewer agent has to reinvent the same wheel inside the agent body — or, more robustly, in a custom steps: block — because:
- PR builds are shallow by default.
checkout: self does a depth-1 fetch, so git merge-base origin/<target> HEAD fails out of the box.
origin/<target> isn't fetched. The agent has to git fetch +refs/heads/<target>:refs/remotes/origin/<target> itself, with progressive deepening until the merge-base resolves (or --unshallow as a fallback).
checkout: self may not persist OAuth credentials in some org configurations, so the fetch needs an explicit http.extraheader=Authorization: bearer ${SYSTEM_ACCESSTOKEN} to be portable.
- The synthetic merge commit (
refs/pull/<id>/merge) requires careful handling — naively diffing against HEAD^2 only works if the checkout actually is a merge commit, which is fragile.
- Asking the LLM to run all of this ends up burning a lot of agent turns/tokens on git plumbing, and the failure mode (agent silently produces an empty or wrong diff and posts a useless review) is hard to detect.
Concrete evidence this is a real footgun
We just shipped an agents/pr-reviewer.md in our own pipelines repo and went through this exact iteration cycle. A rubber-duck pass flagged all of the above as blockers; the final precompute step is ~120 lines of bash, lives in steps: (so the agent receives a ready-made pr-context/ directory), and is, frankly, generic enough that it could be ado-aw's job rather than every adopter's.
Sketch of the user-side step we wrote (full source: https://github.com/msazuresphere/4x4/azure-devops-agentic-pipelines/blob/main/agents/pr-reviewer.md):
steps:
- bash: |
set -euo pipefail
rm -rf pr-context && mkdir -p pr-context
if [ -n "${SYSTEM_ACCESSTOKEN:-}" ]; then
git_fetch() { git -c "http.extraheader=Authorization: bearer ${SYSTEM_ACCESSTOKEN}" fetch "$@"; }
else
git_fetch() { git fetch "$@"; }
fi
# ...progressive --depth=200/500/2000 + --deepen, then --unshallow fallback...
# ...merge-base resolution, diff/snapshot generation, --find-renames, scope filtering,
# truncation cap, OK / NO_PR_CONTEXT / DIFF_RESOLUTION_FAILED status file...
displayName: "Precompute PR diff context"
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
Proposed solution
Add a first-class pr-context: feature (or fold it into the existing on.pr handling) that ado-aw injects into the Agent job whenever the pipeline is PR-triggered. The compiler would emit the bearer-token-authenticated fetch, the merge-base resolution, the diff generation, and write a fixed-layout pr-context/ directory into the workspace before the agent starts.
Opt-in (or on-by-default when on.pr is set):
on:
pr:
branches: { include: [main] }
pr-context:
enabled: true # default true when on.pr is set
scope: # pathspec scope for diff + snapshots
- agents/**
- scripts/**
- ":(top,glob)*.yml"
unified: 5 # -U value (default 3)
max-diff-bytes: 524288 # truncate diff.patch beyond this size
snapshots: true # write pr-context/head-files + pr-context/base-files
Fixed-layout output the agent can rely on (this is exactly the shape we landed on after iteration):
pr-context/
status.txt # OK | NO_PR_CONTEXT | DIFF_RESOLUTION_FAILED
metadata.txt # pr_id, source_branch, target_branch, base_sha, head_sha, build_id, build_reason, repository
changed-files.txt # full git diff --name-status
changed-files-in-scope.txt # name-status restricted to scope
diff.patch # unified diff, scoped, capped, with truncation marker
head-files/<path> # post-PR snapshot of A/M/T/R*/C* files in scope
base-files/<path> # pre-PR snapshot of D files in scope
error.txt # only when status != OK
Optionally surface the same data as agent-visible environment variables:
ADO_AW_PR_BASE_SHA, ADO_AW_PR_HEAD_SHA, ADO_AW_PR_ID, ADO_AW_PR_TARGET_BRANCH, ADO_AW_PR_SOURCE_BRANCH
ADO_AW_PR_CONTEXT_DIR (defaults to $(Build.SourcesDirectory)/pr-context)
And document it in the create/update prompts so PR-reviewer agents become roughly:
### Step 1 — Read the precomputed PR context
A pipeline step has already resolved the PR diff and staged it for you under
`pr-context/`. Do not run git fetch/diff yourself — read these files instead.
| File | Contents |
|-----------------------------------|--------------------------------|
| pr-context/status.txt | OK / NO_PR_CONTEXT / FAILED |
| pr-context/metadata.txt | base/head SHAs, branches, ... |
| pr-context/changed-files.txt | name-status |
| pr-context/diff.patch | unified diff (capped) |
| pr-context/head-files/<path> | post-PR snapshots |
| pr-context/base-files/<path> | pre-PR snapshots (deletes) |
Why this belongs in ado-aw (not userland)
- Correctness is hard to get right — the four blockers above (shallow checkout, missing target ref, persisted-creds variability, synthetic merge commit) trip up nearly every first attempt.
- Every PR-reviewer agent needs this. It's not a niche use case — it's the common one.
- It eliminates a class of silent-failure bugs. Today, the agent computing its own diff can succeed-but-with-empty-output if any of the git plumbing misfires, and you only notice because reviews stop being useful. A precomputed status file with a clear
DIFF_RESOLUTION_FAILED value lets the agent surface the problem instead of guessing.
- It composes with safe-outputs nicely. Combined with
add-pr-comment / submit-pr-review, this would mean "PR reviewer agent" is essentially a 20-line agent file with a focused review prompt.
- It tightens the trust boundary. The agent body is the untrusted part (it consumes PR content); the diff plumbing is trusted infrastructure. Keeping the latter out of the agent's hands is a small but real defense-in-depth win — the agent can be run with
tools.edit: false and a tight bash allow-list and still get full PR context.
Alternatives considered
- Documenting a copy-paste snippet in the create prompt — works but every adopter still maintains it. Doesn't help when the diff-resolution logic itself needs to evolve (e.g. when ADO changes default checkout behavior).
- Reusable template /
runtime-import of a shared script — better than copy-paste, but still requires users to know it exists and wire it up correctly.
- Letting the agent call into the ADO
repos MCP for file lists — works for filenames but doesn't give the agent the actual diff content, and the MCP round-trips are noisier in logs than a single pre-step.
Acceptance criteria (suggested)
- New
pr-context: frontmatter key (or auto-injection driven off on.pr) compiles to a bash step that runs after checkout: self and before the agent.
- The step handles shallow checkout, missing target ref, missing OAuth creds, and synthetic merge commits without further user configuration.
- Output layout matches the shape above (or an equivalent agreed shape) and is documented in the create prompt.
status.txt is the single source of truth for whether the agent has usable PR context.
- A worked example PR-reviewer agent is added to the prompts/examples.
Happy to contribute the bash we already shipped as a starting point if it's useful.
feat: built-in PR context precomputation for PR-triggered agents
Problem
When writing a PR-triggered agent (
on.pr), the agent itself almost always wants to know what changed: the changed file list, a unified diff, the base/head SHAs, and per-file pre/post snapshots. Today every author of a PR-reviewer agent has to reinvent the same wheel inside the agent body — or, more robustly, in a customsteps:block — because:checkout: selfdoes a depth-1 fetch, sogit merge-base origin/<target> HEADfails out of the box.origin/<target>isn't fetched. The agent has togit fetch +refs/heads/<target>:refs/remotes/origin/<target>itself, with progressive deepening until the merge-base resolves (or--unshallowas a fallback).checkout: selfmay not persist OAuth credentials in some org configurations, so the fetch needs an explicithttp.extraheader=Authorization: bearer ${SYSTEM_ACCESSTOKEN}to be portable.refs/pull/<id>/merge) requires careful handling — naively diffing againstHEAD^2only works if the checkout actually is a merge commit, which is fragile.Concrete evidence this is a real footgun
We just shipped an
agents/pr-reviewer.mdin our own pipelines repo and went through this exact iteration cycle. A rubber-duck pass flagged all of the above as blockers; the final precompute step is ~120 lines of bash, lives insteps:(so the agent receives a ready-madepr-context/directory), and is, frankly, generic enough that it could be ado-aw's job rather than every adopter's.Sketch of the user-side step we wrote (full source: https://github.com/msazuresphere/4x4/azure-devops-agentic-pipelines/blob/main/agents/pr-reviewer.md):
Proposed solution
Add a first-class
pr-context:feature (or fold it into the existingon.prhandling) that ado-aw injects into the Agent job whenever the pipeline is PR-triggered. The compiler would emit the bearer-token-authenticated fetch, the merge-base resolution, the diff generation, and write a fixed-layoutpr-context/directory into the workspace before the agent starts.Opt-in (or on-by-default when
on.pris set):Fixed-layout output the agent can rely on (this is exactly the shape we landed on after iteration):
Optionally surface the same data as agent-visible environment variables:
ADO_AW_PR_BASE_SHA,ADO_AW_PR_HEAD_SHA,ADO_AW_PR_ID,ADO_AW_PR_TARGET_BRANCH,ADO_AW_PR_SOURCE_BRANCHADO_AW_PR_CONTEXT_DIR(defaults to$(Build.SourcesDirectory)/pr-context)And document it in the create/update prompts so PR-reviewer agents become roughly:
Why this belongs in ado-aw (not userland)
DIFF_RESOLUTION_FAILEDvalue lets the agent surface the problem instead of guessing.add-pr-comment/submit-pr-review, this would mean "PR reviewer agent" is essentially a 20-line agent file with a focused review prompt.tools.edit: falseand a tight bash allow-list and still get full PR context.Alternatives considered
runtime-importof a shared script — better than copy-paste, but still requires users to know it exists and wire it up correctly.reposMCP for file lists — works for filenames but doesn't give the agent the actual diff content, and the MCP round-trips are noisier in logs than a single pre-step.Acceptance criteria (suggested)
pr-context:frontmatter key (or auto-injection driven offon.pr) compiles to a bash step that runs aftercheckout: selfand before the agent.status.txtis the single source of truth for whether the agent has usable PR context.Happy to contribute the bash we already shipped as a starting point if it's useful.