feat: built-in PR context precomputation for PR-triggered agents

# feat: built-in PR context precomputation for PR-triggered agents

## Problem

When writing a PR-triggered agent (`on.pr`), the agent itself almost always wants to know **what changed**: the changed file list, a unified diff, the base/head SHAs, and per-file pre/post snapshots. Today every author of a PR-reviewer agent has to reinvent the same wheel inside the agent body — or, more robustly, in a custom `steps:` block — because:

1. **PR builds are shallow by default.** `checkout: self` does a depth-1 fetch, so `git merge-base origin/<target> HEAD` fails out of the box.
2. **`origin/<target>` isn't fetched.** The agent has to `git fetch +refs/heads/<target>:refs/remotes/origin/<target>` itself, with progressive deepening until the merge-base resolves (or `--unshallow` as a fallback).
3. **`checkout: self` may not persist OAuth credentials** in some org configurations, so the fetch needs an explicit `http.extraheader=Authorization: bearer ${SYSTEM_ACCESSTOKEN}` to be portable.
4. **The synthetic merge commit** (`refs/pull/<id>/merge`) requires careful handling — naively diffing against `HEAD^2` only works if the checkout actually is a merge commit, which is fragile.
5. **Asking the LLM to run all of this** ends up burning a lot of agent turns/tokens on git plumbing, and the failure mode (agent silently produces an empty or wrong diff and posts a useless review) is hard to detect.

## Concrete evidence this is a real footgun

We just shipped an `agents/pr-reviewer.md` in our own pipelines repo and went through this exact iteration cycle. A rubber-duck pass flagged **all** of the above as blockers; the final precompute step is ~120 lines of bash, lives in `steps:` (so the agent receives a ready-made `pr-context/` directory), and is, frankly, generic enough that it could be ado-aw's job rather than every adopter's.

Sketch of the user-side step we wrote (full source: <https://github.com/msazuresphere/4x4/azure-devops-agentic-pipelines/blob/main/agents/pr-reviewer.md>):

```yaml
steps:
  - bash: |
      set -euo pipefail
      rm -rf pr-context && mkdir -p pr-context

      if [ -n "${SYSTEM_ACCESSTOKEN:-}" ]; then
        git_fetch() { git -c "http.extraheader=Authorization: bearer ${SYSTEM_ACCESSTOKEN}" fetch "$@"; }
      else
        git_fetch() { git fetch "$@"; }
      fi

      # ...progressive --depth=200/500/2000 + --deepen, then --unshallow fallback...
      # ...merge-base resolution, diff/snapshot generation, --find-renames, scope filtering,
      #    truncation cap, OK / NO_PR_CONTEXT / DIFF_RESOLUTION_FAILED status file...
    displayName: "Precompute PR diff context"
    env:
      SYSTEM_ACCESSTOKEN: $(System.AccessToken)
```

## Proposed solution

Add a first-class **`pr-context:`** feature (or fold it into the existing `on.pr` handling) that ado-aw injects into the Agent job whenever the pipeline is PR-triggered. The compiler would emit the bearer-token-authenticated fetch, the merge-base resolution, the diff generation, and write a fixed-layout `pr-context/` directory into the workspace before the agent starts.

Opt-in (or on-by-default when `on.pr` is set):

```yaml
on:
  pr:
    branches: { include: [main] }

pr-context:
  enabled: true            # default true when on.pr is set
  scope:                   # pathspec scope for diff + snapshots
    - agents/**
    - scripts/**
    - ":(top,glob)*.yml"
  unified: 5               # -U value (default 3)
  max-diff-bytes: 524288   # truncate diff.patch beyond this size
  snapshots: true          # write pr-context/head-files + pr-context/base-files
```

Fixed-layout output the agent can rely on (this is exactly the shape we landed on after iteration):

```
pr-context/
  status.txt                       # OK | NO_PR_CONTEXT | DIFF_RESOLUTION_FAILED
  metadata.txt                     # pr_id, source_branch, target_branch, base_sha, head_sha, build_id, build_reason, repository
  changed-files.txt                # full git diff --name-status
  changed-files-in-scope.txt       # name-status restricted to scope
  diff.patch                       # unified diff, scoped, capped, with truncation marker
  head-files/<path>                # post-PR snapshot of A/M/T/R*/C* files in scope
  base-files/<path>                # pre-PR snapshot of D files in scope
  error.txt                        # only when status != OK
```

Optionally surface the same data as agent-visible environment variables:

- `ADO_AW_PR_BASE_SHA`, `ADO_AW_PR_HEAD_SHA`, `ADO_AW_PR_ID`, `ADO_AW_PR_TARGET_BRANCH`, `ADO_AW_PR_SOURCE_BRANCH`
- `ADO_AW_PR_CONTEXT_DIR` (defaults to `$(Build.SourcesDirectory)/pr-context`)

And document it in the create/update prompts so PR-reviewer agents become roughly:

```markdown
### Step 1 — Read the precomputed PR context

A pipeline step has already resolved the PR diff and staged it for you under
`pr-context/`. Do not run git fetch/diff yourself — read these files instead.

| File                              | Contents                       |
|-----------------------------------|--------------------------------|
| pr-context/status.txt             | OK / NO_PR_CONTEXT / FAILED    |
| pr-context/metadata.txt           | base/head SHAs, branches, ...  |
| pr-context/changed-files.txt      | name-status                    |
| pr-context/diff.patch             | unified diff (capped)          |
| pr-context/head-files/<path>      | post-PR snapshots              |
| pr-context/base-files/<path>      | pre-PR snapshots (deletes)     |
```

## Why this belongs in ado-aw (not userland)

- **Correctness is hard to get right** — the four blockers above (shallow checkout, missing target ref, persisted-creds variability, synthetic merge commit) trip up nearly every first attempt.
- **Every PR-reviewer agent needs this.** It's not a niche use case — it's the *common* one.
- **It eliminates a class of silent-failure bugs.** Today, the agent computing its own diff can succeed-but-with-empty-output if any of the git plumbing misfires, and you only notice because reviews stop being useful. A precomputed status file with a clear `DIFF_RESOLUTION_FAILED` value lets the agent surface the problem instead of guessing.
- **It composes with safe-outputs nicely.** Combined with `add-pr-comment` / `submit-pr-review`, this would mean "PR reviewer agent" is essentially a 20-line agent file with a focused review prompt.
- **It tightens the trust boundary.** The agent body is the untrusted part (it consumes PR content); the diff plumbing is trusted infrastructure. Keeping the latter out of the agent's hands is a small but real defense-in-depth win — the agent can be run with `tools.edit: false` and a tight bash allow-list and still get full PR context.

## Alternatives considered

- **Documenting a copy-paste snippet** in the create prompt — works but every adopter still maintains it. Doesn't help when the diff-resolution logic itself needs to evolve (e.g. when ADO changes default checkout behavior).
- **Reusable template / `runtime-import` of a shared script** — better than copy-paste, but still requires users to know it exists and wire it up correctly.
- **Letting the agent call into the ADO `repos` MCP for file lists** — works for filenames but doesn't give the agent the actual diff content, and the MCP round-trips are noisier in logs than a single pre-step.

## Acceptance criteria (suggested)

- New `pr-context:` frontmatter key (or auto-injection driven off `on.pr`) compiles to a bash step that runs after `checkout: self` and before the agent.
- The step handles shallow checkout, missing target ref, missing OAuth creds, and synthetic merge commits without further user configuration.
- Output layout matches the shape above (or an equivalent agreed shape) and is documented in the create prompt.
- `status.txt` is the single source of truth for whether the agent has usable PR context.
- A worked example PR-reviewer agent is added to the prompts/examples.

Happy to contribute the bash we already shipped as a starting point if it's useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: built-in PR context precomputation for PR-triggered agents #860

feat: built-in PR context precomputation for PR-triggered agents

Problem

Concrete evidence this is a real footgun

Proposed solution

Why this belongs in ado-aw (not userland)

Alternatives considered

Acceptance criteria (suggested)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: built-in PR context precomputation for PR-triggered agents #860

Description

feat: built-in PR context precomputation for PR-triggered agents

Problem

Concrete evidence this is a real footgun

Proposed solution

Why this belongs in ado-aw (not userland)

Alternatives considered

Acceptance criteria (suggested)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions