Skip to content

feat(compile): add execution-context plugin with PR contributor (#860)#865

Open
jamesadevine wants to merge 6 commits into
mainfrom
devinejames/pr-context
Open

feat(compile): add execution-context plugin with PR contributor (#860)#865
jamesadevine wants to merge 6 commits into
mainfrom
devinejames/pr-context

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

@jamesadevine jamesadevine commented Jun 5, 2026

Summary

Closes #860.

Adds an always-on ExecContextExtension that precomputes a
minimal, focused set of execution context signals for the agent
on PR-triggered builds, and appends a tailored fragment to the agent
prompt so reviewer / triage agents stop reinventing git fetch,
merge-base resolution, and shallow-clone handling in every workflow
body.

The design follows the issue author's broader-scope request: a
plugin surface (ContextContributor trait + Contributor enum)
rather than a one-off PR helper. Future contributors (work-item,
build-history, schedule, …) plug into the same step slot and share
the same trust-boundary.

Design (v6.2 — aligned with gh-aw)

After several iterations, the design converged on a much smaller
surface than v1, inspired by how
gh-aw handles PR context. The
compiler does ONLY what the agent cannot do itself inside the AWF
sandbox:

  1. Fetch the PR target branch with progressive deepening until
    git merge-base resolves (requires the bearer; cannot happen
    inside the agent's network-isolated sandbox).
  2. Stage aw-context/pr/base.sha and aw-context/pr/head.sha
    (40-char hex SHAs) so the agent reuses them across many
    git diff invocations.
  3. Append a prompt fragment to /tmp/awf-tools/agent-prompt.md
    listing the right git commands and example ADO MCP tool calls
    (repo_get_pull_request_by_id,
    repo_list_pull_request_threads,
    repo_create_pull_request_thread) with PR id / project / repo
    pre-filled as literal values.
  4. Auto-extend the agent's bash allow-list with 7 read-only git
    commands (git, git diff, git log, git show,
    git status, git rev-parse, git symbolic-ref) so the agent
    can inspect the diff locally.

The agent then runs git diff $BASE..$HEAD itself inside the AWF
sandbox — the objects are already fetched into the workspace and
mounted in. No changed-files.txt, no diff.patch, no
head-files/ / base-files/ snapshots, no status.txt protocol.

base.sha always has the SAME semantics in both code paths
(synthetic-merge-commit and progressive-deepening): the true common
ancestor via git merge-base. The agent therefore sees the same
"PR diff since branch-point" change set regardless of how ADO
checked out the workspace.

Trust boundary (non-negotiable)

  • No persistCredentials: true and no checkout override
    would write the bearer to .git/config inside the AWF-mounted
    workspace where the agent could exfiltrate it.
  • SYSTEM_ACCESSTOKEN is mapped from $(System.AccessToken)
    only into the precompute step's env: block; never into the
    agent step.
  • Bearer injected into git fetch via
    GIT_CONFIG_COUNT / GIT_CONFIG_KEY_0 / GIT_CONFIG_VALUE_0
    env vars (not git -c http.extraheader=… argv — no
    process-listing leak).
  • Step gated by
    condition: eq(variables['Build.Reason'], 'PullRequest').
  • All 4 identifiers interpolated into the agent prompt or git
    refspecs (PR_ID, PROJECT, REPO, PR_TARGET_BRANCH) are
    strictly regex-validated before use, even though they come from
    ADO infra rather than user-controlled input.

Front matter (additive, optional)

on:
  pr:
    branches:
      include: [main]

# Optional — defaults to "on when on.pr is configured".
execution-context:
  pr:
    enabled: true   # default true when on.pr is set; set false to opt out
                    # (also suppresses auto-adding the 7 git commands
                    # to the agent's bash allow-list).

on.pr is REQUIRED for the contributor to activate. Setting
pr.enabled: true on a non-PR-triggered agent does NOT activate
the contributor (the prepare step would be dead code, and silently
widening the agent's bash surface would be a footgun).

Iterating to v6.2 — what changed vs v1

The original implementation staged 6 files and a status.txt
protocol with 4 status values. After review feedback:

  • Dropped status.txt, metadata.txt, changed-files*.txt,
    diff.patch, head-files/, base-files/. The agent runs its
    own git diff against the precomputed SHAs.
  • Collapsed execution-context.pr from 5 knobs
    (scope/unified/max-diff-bytes/snapshots/enabled) to
    one (enabled).
  • Short identifiers (PR id, project, repo) are interpolated
    directly into the prompt fragment via printf, not staged as
    files.
  • Failure path: error.txt + failure prompt fragment that tells
    the agent to surface the error (and use ADO MCP as a fallback)
    rather than produce an empty review.
  • Auto-extends the agent bash allow-list with the 7 git commands.

Robustness fixes from review

  1. base.sha semantics are now consistent across the
    synthetic-merge-commit path (HEAD^1 + HEAD^2) and the
    progressive-deepening path. Both compute
    git merge-base so git diff $BASE..$HEAD produces the same
    change set regardless of ADO's checkout mode.
  2. PR_TARGET_BRANCH is regex-validated before
    interpolation into the git refspec
    (^[A-Za-z0-9._/-]+$), matching the posture of the other
    identifiers.
  3. pr.enabled: true without on.pr does not activate the
    contributor (no compile-time widening of the agent bash
    allow-list for a step that can never run at runtime).
  4. Progressive deepening actually works — target-branch fetch
    loop only breaks when git merge-base resolves (not on the
    first successful fetch).
  5. No argv leak — bearer in GIT_CONFIG_* env, not argv.
  6. Strengthened trust-boundary test — walks the compiled YAML
    and asserts SYSTEM_ACCESSTOKEN appears only on the
    exec-context prepare step's env: block (matched by
    displayName).

Docs

  • New docs/execution-context.md — full reference (motivation,
    contributors, field reference, agent-visible layout, prompt
    fragment shape, bash-allowlist auto-extension, trust boundary,
    precompute step walkthrough).
  • docs/front-matter.md — added execution-context: block to
    the example.
  • AGENTS.md — index entry + architecture tree updated.
  • prompts/create-ado-agentic-workflow.md — directs PR-reviewer
    authors to use the auto-staged context and prompt fragment
    rather than rolling their own precompute.

Test plan

  • cargo build — clean.
  • cargo clippy --all-targets --all-features — clean.
  • cargo test1739 unit tests + 127 compiler tests
    (9 exec-context tests) + bash_lint_tests with
    ENFORCE_BASH_LINT=1 (shellcheck v0.10.0) + every other suite:
    all pass.

Exec-context tests (9 total)

  • test_execution_context_pr_compiled_output_is_valid_yaml
  • test_execution_context_pr_emits_prepare_step_and_prompt_supplement
  • test_execution_context_pr_does_not_leak_system_accesstoken
    (parses YAML, asserts token env appears only on the
    exec-context prepare step by displayName)
  • test_execution_context_pr_not_emitted_when_no_pr_trigger
  • test_execution_context_pr_auto_extends_bash_allowlist
  • test_execution_context_pr_does_not_extend_bash_when_disabled
  • test_execution_context_pr_enabled_true_without_on_pr_is_inactive
  • test_execution_context_pr_synthetic_merge_uses_merge_base
    (regression guard against the inconsistent-base.sha bug)
  • test_execution_context_pr_target_branch_validated

The execution-context-agent.md fixture is also in the
bash_lint_tests FIXTURES list so the generated bash is
shellchecked on every CI run.

E2E validation in a real ADO build (replacing existing
pr-reviewer steps with this contributor) is intentionally out of
scope for this PR
— tracked separately because it requires a
real ADO run.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Summary: Looks good overall — the trust boundary design is solid and the test coverage is thorough. One stale doc/comment inconsistency in the security-critical section warrants a fix before merge; two minor latent hazards noted below.


Findings

🔒 Security Concerns

  • docs/execution-context.md trust boundary table + src/compile/extensions/exec_context/pr.rs module comment describe the old, less-secure bearer injection mechanism that was already superseded in the same PR.

    The PR description (and the inline bash comment at line ~774 of the generated step) correctly states the token is injected via GIT_CONFIG_COUNT/KEY_0/VALUE_0 env vars so it never appears in process argv. But two places still say git -c http.extraheader=...:

    • docs/execution-context.md, trust boundary table row for "In-step SYSTEM_ACCESSTOKEN + bash bearer wrapper":

      A git_fetch wrapper injects git -c http.extraheader="Authorization: bearer ${SYSTEM_ACCESSTOKEN}" fetch ...

    • src/compile/extensions/exec_context/pr.rs, module-level doc comment footgun add agentic workflows #3:

      wraps git fetches with git -c http.extraheader=Authorization: bearer ...

    These descriptions refer to the old approach that the PR itself says was replaced ("No argv leak — bearer moved from git -c http.extraheader=... to GIT_CONFIG_* env"). The trust boundary doc is exactly what future security reviewers will read. Having it describe the inferior approach is misleading and could cause a reviewer to incorrectly approve a future regression. Both locations should be updated to match the actual GIT_CONFIG_* mechanism.

⚠️ Suggestions

  • src/compile/extensions/exec_context/mod.rs prompt_supplement(): when any_contributor_active is true but every contributor's prompt_fragment() returns an empty/whitespace string, the method returns Some("\n---\n\n## Execution context\n") — a section header with no body. Not a bug in v1 (the PR contributor always returns a non-empty fragment), but a latent hazard for future contributors that emit nothing. A guard like if body.trim_start_matches("\n---\n\n## Execution context\n").trim().is_empty() { return None; } before the final Some(body) would make the invariant self-enforcing.

  • any_contributor_active pre-computation in ExecContextExtension::new() mirrors each contributor's should_activate() logic. The comment acknowledges this duplication as intentional. The risk is that a future contributor author wires up should_activate() but forgets to OR-in the aggregate check, silently suppressing the prompt supplement. A short doc comment like // MAINTENANCE: update this aggregate when adding a new contributor in new() would reduce that risk.

✅ What Looks Good

  • Progressive deepening is correctly implemented — the loop only breaks when git merge-base actually resolves after each fetch depth, not on a successful-fetch-alone. This was the specific correctness issue called out in the PR description and the fix is right.
  • Trust boundary is robust: SYSTEM_ACCESSTOKEN is scoped to the precompute step env; persistCredentials: true is never emitted (and there's a regression test that walks the parsed YAML to assert exactly this); .git/config is never written.
  • Compile-time scope validation in ExecContextExtension::validate() rejects ADO expressions ($(, ${{, $[), pipeline commands (##vso[, ##[), and newlines — the right set of injection vectors for values that get interpolated by ADO before bash sees them.
  • scope_for_bash() quoting uses POSIX single-quote escaping ('\') with the values already validated to be newline-free. Correct.
  • CTX_OK tracking means a failing git diff flips the status to CONTEXT_GENERATION_FAILED instead of silently writing OK with an empty patch. The status file discipline is consistent throughout.
  • Test coverage is thorough: YAML validity, prepare step spot-checks, trust-boundary YAML walk, no-PR-trigger suppression, compile-fail for ADO expressions in scope.

Generated by Rust PR Reviewer for issue #865 · sonnet46 1.6M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Summary: Well-engineered feature with a solid security design — a few correctness and documentation gaps worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/compile/extensions/exec_context/pr.rs:179–195base.sha has different semantics across the two code paths. In the synthetic merge-commit path, BASE_SHA = HEAD^1 (the target branch tip at merge-preparation time), while in the progressive-deepening path, BASE_SHA = git merge-base origin/$target HEAD (the true common ancestor). These are not the same SHA when the target branch has advanced since the PR branched. git diff $BASE..$HEAD therefore produces different change sets depending on ADO checkout mode — the agent sees a narrower "what's new vs current target" diff on one runner and a broader "full PR since branch-point" diff on another. The docs at docs/execution-context.md:89 call it "target merge-base SHA" but in the synthetic path it is the target tip. Either the synthetic path should compute git merge-base HEAD^1 HEAD^2 for consistency, or the docs/comments should clearly distinguish the two semantics.

  • pr.rs:175|| echo 0 is unreachable dead code. wc -w always exits 0 and outputs 0 on empty input, so the fallback || echo 0 never fires. Minor, but worth removing to avoid false impressions about the error-handling path.

🔒 Security Concerns

  • pr.rs:147–151PR_TARGET_BRANCH has no format validation, unlike the other three identifiers. PR_ID, PROJECT, and REPO are all guarded by strict allowlist regexes before use. PR_TARGET_BRANCH is only checked for emptiness, then its stripped form is interpolated into a git refspec:
    git_fetch --no-tags "$_depth_arg" origin \
      "+refs/heads/${PR_TARGET_SHORT}:refs/remotes/origin/${PR_TARGET_SHORT}"
    The value comes from ADO infra (System.PullRequest.TargetBranch), so the practical risk is low. But the asymmetry means a defensive-in-depth validation line is missing. A regex like ^[A-Za-z0-9._/\-]+$ on PR_TARGET_BRANCH before use (or on PR_TARGET_SHORT after stripping) would close the gap consistently.

⚠️ Suggestions

  • mod.rs:78–82 + pr.rs:76–83 — explicit pr.enabled: true on a non-PR-triggered agent silently widens the bash allow-list. The should_activate Some(true) => true arm is evaluated independently of whether on.pr is configured. The prepare step is ADO-condition-gated at runtime (safe), but required_bash_commands() is a compile-time artifact: the 7 git commands are added to the agent's bash allow-list even though the step will never run. Consider emitting a compile-time warning, or documenting this footgun explicitly in docs/execution-context.md.

  • PR description is stale post-v6.2. The body describes scope[], unified, max-diff-bytes, and snapshots front-matter fields, and mentions test_execution_context_pr_scope_rejects_ado_expressions as a new test — all of which were removed in the v6.2 simplification commit. The PR description test plan also says "5 new exec-context tests" but the final code has 6. This creates reviewer confusion; worth updating the PR description or noting in a comment.

✅ What Looks Good

  • Bearer token isolation via GIT_CONFIG_* env vars — the token never appears in process argv, never lands in .git/config, never reaches the agent step's env. The trust-boundary test that walks the YAML and asserts the token appears only under the expected displayName is a great regression guard.
  • Progressive deepening loop logic — correctly breaks only when git merge-base actually resolves (not on the first successful fetch). The fetch_target_at_depth || continue pattern is clean.
  • set -uo pipefail without -e + explicit fail() pattern — all failure paths write error.txt, append a failure prompt fragment, and exit 0 so the build proceeds. The agent surfaces the failure rather than the build failing silently.
  • Compile-time identifier validation before prompt interpolation — strict allowlists on PR_ID/PROJECT/REPO before they're written into the agent prompt prevent ADO-expression and newline injection. Defence-in-depth even though these are infra-provided values.
  • test_execution_context_pr_does_not_leak_system_accesstoken — walking the parsed YAML rather than just substring-searching for the token is the right approach for a trust-boundary regression test.

Generated by Rust PR Reviewer for issue #865 · sonnet46 2.7M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Summary: Looks good overall — the trust-boundary design is solid and the test suite is thorough. Two items worth addressing before merge.


Findings

⚠️ Suggestions

any_contributor_active duplication is a divergence trap (src/compile/extensions/exec_context/mod.rs, ExecContextExtension::new)

The any_contributor_active pre-computation in new() replicates each contributor's should_activate logic verbatim. The MAINTENANCE comment is good, but this pattern means a new contributor can be wired into contributors() and should_activate while its bash allow-list contribution is silently suppressed because any_contributor_active wasn't updated. That's exactly the footgun the comment warns about.

A lightweight guard: add a test that constructs ExecContextExtension::new() and a CompileContext for every contributor's "active" scenario, then asserts required_bash_commands() is non-empty — same input, both code paths exercised. That would catch a divergence immediately rather than at E2E time. As written, the nine tests don't cover this since they use compile-fixture paths that go through prepare_steps, not required_bash_commands independently.


Silent infra failure path in set -uo pipefail (no -e) (src/compile/extensions/exec_context/pr.rs, bash body)

Without -e, an early infrastructure failure (e.g. mkdir -p "$AW_PR_DIR" — unlikely but possible if BUILD_SOURCESDIRECTORY resolves to a read-only path) lets execution continue silently, fail() then can't write error.txt, and the step exits 0 with nothing staged and nothing in the agent prompt. The agent receives no context and no failure signal.

A simple guard at the top — mkdir -p "$AW_PR_DIR" || { echo "[aw-context] could not create aw-context dir"; exit 1; } — would surface this loudly. Unlike the merge-base failure (which is a normal operational case warranting the fail() soft path), a missing output directory is always a broken pipeline.


✅ What Looks Good

  • Trust boundary is excellent. Bearer injected via GIT_CONFIG_* env (never argv), SYSTEM_ACCESSTOKEN scoped to the single prepare step's env: block, no persistCredentials: true. The YAML-walking regression test in test_execution_context_pr_does_not_leak_system_accesstoken is exactly the right kind of safety net.
  • BASE_SHA consistency is correct. Both the synthetic-merge path (git merge-base MERGE_P1 MERGE_P2) and the progressive-deepening path (git merge-base origin/$target HEAD) use true merge-base semantics. The regression guard test_execution_context_pr_synthetic_merge_uses_merge_base is a good shield against future simplification.
  • Progressive deepening loop is right. The break condition fires only when git merge-base actually resolves — not on the first successful fetch — so the agent always gets the true common ancestor or an explicit failure, never a stale shallow-clone "merge-base".
  • printf '%s' throughout. The fail-path message uses printf '%s' for all four identifier substitutions, so even an ADO var that somehow contained %n or similar couldn't cause format-string trouble.
  • Contributor/extension separation is clean. The private ContextContributor trait + Contributor enum mirrors the Extension enum pattern exactly; plugging in a future ScheduleContributor will be mechanical.

Generated by Rust PR Reviewer for issue #865 · sonnet46 3.8M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Summary: Looks good — well-designed, with excellent security posture and thorough test coverage. One defense-in-depth gap in a regex and a cosmetic EOF issue worth fixing before merge.

Findings

⚠️ Suggestions

  • src/compile/extensions/exec_context/pr.rsPR_TARGET_BRANCH regex allows .. sequences

    The validation regex is ^[A-Za-z0-9._/-]+$, which accepts values like ../evil or feature..name. Both git fetch and git merge-base reject ..-containing refspecs as "invalid refspec" / "Not a valid object name" respectively, so the attack fails gracefully into fail() — there is no security breach. But the PR's stated posture is "defence-in-depth with the same strictness as the other identifiers", and those identifiers don't allow ... Tightening the regex to explicitly exclude .. would be consistent with that posture:

    # Current
    if [[ ! "$PR_TARGET_BRANCH" =~ ^[A-Za-z0-9._/-]+$ ]]; then
    
    # Suggestion — add a secondary check that rejects ".." sequences
    if [[ ! "$PR_TARGET_BRANCH" =~ ^[A-Za-z0-9._/-]+$ ]] || [[ "$PR_TARGET_BRANCH" == *".."* ]]; then

    (Confirmed ../evil and feature..name pass the current regex via live bash test.)

  • tests/compiler_tests.rs — missing newline at end of file

    The final } of the new exec-context test block has no trailing newline (\ No newline at end of file in the diff). All other source files in the repo end with a newline; this is inconsistent and will show up in future diffs.

✅ What Looks Good

  • Trust boundary is solid: SYSTEM_ACCESSTOKEN is mapped only into the precompute step's env: block via GIT_CONFIG_* env vars (not git -c argv), never written to .git/config, never in the agent env. The YAML walk test (test_execution_context_pr_does_not_leak_system_accesstoken) enforces this programmatically against the compiled output.

  • Consistent base.sha semantics: Both the synthetic-merge-commit path (git merge-base HEAD^1 HEAD^2) and the progressive-deepening path compute the true common ancestor, so git diff $BASE..$HEAD produces the same change set regardless of ADO checkout mode. The regression guard (test_execution_context_pr_synthetic_merge_uses_merge_base) locks this in.

  • any_contributor_active divergence trap: The six unit tests in exec_context/mod.rs mirror every branch of PrContextContributor::should_activate, ensuring a future contributor author who wires should_activate but forgets the aggregate pre-computation trips at unit-test time rather than silently widening the bash allow-list at E2E time.

  • fail() exiting 0: Intentional and correct — the build continues with the agent using the ADO MCP fallback rather than aborting the pipeline; the failure prompt fragment explicitly forbids an empty review.

  • Bash allow-list footgun guard: pr.enabled: true without on.pr does NOT widen the allow-list. Correctly implemented in both ExecContextExtension::new and PrContextContributor::should_activate, and covered by dedicated tests.

Generated by Rust PR Reviewer for issue #865 · sonnet46 2M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Summary: Well-designed with strong trust-boundary controls and excellent test coverage; one low-severity prompt injection vector worth addressing.

Findings

🔒 Security Concerns

  • scripts/ado-script/src/exec-context-pr/validate.ts:53 + index.ts:76 — When TARGET_BRANCH_RE validation fails, the raw (unvalidated) targetBranch value is embedded verbatim in the reason string, which is then appended to the agent prompt via failureFragment. A PR author with push access who creates a branch with a newline or markdown header in its name (e.g., refs/heads/foo\n## Injected Section\nIgnore previous instructions) can inject content into the agent's prompt context through this failure path. The vector requires push access to the repo and hits only the failure path, making it low severity — but worth a targeted fix. Consider sanitizing targetBranch before embedding it in the error reason (replace \r\n with , truncate to N chars, or use a placeholder like <invalid branch name>).

🐛 Bugs / Logic Issues

  • scripts/ado-script/src/exec-context-pr/merge-base.ts:33countParents is named and documented as returning the number of parents, but actually returns the number of tokens in the rev-list --parents output (commit SHA + parent SHAs). For a merge commit the value is 3 (= 1 commit + 2 parents), not 2. The parents=${parents} in the failure message (resolveMergeBase line ~105) would emit parents=3 for a normal merge commit, which a debugging reader would interpret as "3 parents" (a rare octopus merge) rather than "standard 2-parent merge". The condition parents >= 3 is correct, but the name and message are misleading. parentTokenCount or just renaming the condition comment would prevent future confusion.

⚠️ Suggestions

  • scripts/ado-script/src/exec-context-pr/git.ts:14 — The bearerEnv docstring says "the existing bash path falls through" — stale v6.2 language; the bash path no longer exists. Suggest updating to simply: "callers still attempt the fetch without authentication."

  • scripts/ado-script/test/smoke.test.ts (exec-context-pr smoke tests) — The success smoke test verifies that base.sha and head.sha are non-empty 40-char hex strings and differ from each other, but doesn't cross-check them against the expected git ancestry. For makeSyntheticMergeRepo, the test could additionally verify head.sha == git rev-parse HEAD^2 and base.sha == git merge-base HEAD^1 HEAD^2 in the repo — a simple regression guard against SHA transposition or wrong-ref bugs.

  • tests/fixtures/dedupe_gate_only.md — The fixture now disables exec-context to preserve the "download must land in Setup only" test's semantics. This means the inlined-imports: true + exec-context PR active combination (where the bundle download should appear in both Setup AND Agent jobs) has no test coverage. A new test verifying that download appears in the Agent job when inlined-imports: true and on.pr is configured without execution-context.pr.enabled: false would close this gap.

✅ What Looks Good

  • Trust boundary is solid: SYSTEM_ACCESSTOKEN scoped to the single node exec-context-pr.js step, passed only to the spawned git child via GIT_CONFIG_* (not argv, not .git/config). The test_execution_context_pr_does_not_leak_system_accesstoken YAML-walk test is a strong regression guard.
  • Progressive deepening logic is correct: the loop breaks only when git merge-base actually resolves (not just on a successful fetch), and base.sha semantics are identical between the synthetic-merge and deepening paths.
  • Duplicate-activation safety: the divergence-trap unit tests in exec_context/mod.rs are a smart pattern for the dual pr_contributor_will_activate_with_cfg / should_activate maintenance hazard.
  • pr.enabled: true without on.pr footgun prevention: correctly suppressed at both required_bash_commands and prepare_steps — the compile-time test test_execution_context_pr_enabled_true_without_on_pr_is_inactive directly guards the footgun scenario.
  • set -euo pipefail + correct exit semantics: soft failures exit 0; only infra failure (mkdirSync) exits 1; wrapping bash propagates correctly.
  • TypeScript unit test coverage is thorough — injectable GitRunners pattern lets the merge-base tests exercise every loop branch without a real git repo.

Generated by Rust PR Reviewer for issue #865 · sonnet46 2.4M ·

jamesadevine and others added 6 commits June 6, 2026 09:14
Adds an always-on ExecContextExtension that materialises `aw-context/pr/*`
on disk for PR-triggered runs, so reviewer/triage agents stop reinventing
`git fetch` / `git diff` / merge-base resolution in every workflow body.

Bearer is mapped only into the precompute step's env and injected into
git via `GIT_CONFIG_COUNT/KEY_0/VALUE_0` (never argv); no
`persistCredentials: true` and no checkout override, so the
AWF-sandboxed agent never sees `SYSTEM_ACCESSTOKEN`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aligns the execution-context plugin with gh-aw's model: the agent

does its own `git diff` against precomputed SHAs, the compiler only

handles the parts the sandbox can't (fetch+merge-base resolution,

bearer-protected).

Artefacts collapse from ~6 files to 2 (`base.sha`, `head.sha`) plus

one failure file (`error.txt`). Short identifiers (PR id / project /

repo) are interpolated directly into the prompt fragment via

`printf` calls appended to `/tmp/awf-tools/agent-prompt.md` from the

prepare step (same mechanism gh-aw uses for built-in prompt sections).

The prompt fragment lists the right `git` commands and example ADO

MCP tool calls (`repo_get_pull_request_by_id` etc.) with PR id /

project / repo pre-filled. When the precompute fails, a failure

fragment tells the agent to surface the error rather than produce an

empty review (ADO MCP calls still possible).

Auto-extends the agent's bash allow-list with 7 read-only git

commands (`git`, `git diff`, `git log`, `git show`,

`git status`, `git rev-parse`, `git symbolic-ref`) when the PR

contributor activates, so a restricted-bash agent can still inspect

the diff. Opt-out via `execution-context.pr.enabled: false`.

Dropped from v1: `status.txt` protocol, `metadata.txt`,

`changed-files*.txt`, `diff.patch` (with scope/unified/max-bytes

knobs), and `head-files/` / `base-files/` snapshots. The

`execution-context.pr` block collapses to `{ enabled }` only.

Trust boundary unchanged: `SYSTEM_ACCESSTOKEN` scoped to the

prepare step's env, `GIT_CONFIG_*` bearer injection (no argv leak),

no `persistCredentials`, no checkout override, condition gate on

`Build.Reason == 'PullRequest'`.

Issue: #860

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four review findings from the automated Rust PR reviewer
(workflow run 27037700172).

1. Bug — inconsistent `base.sha` semantics across paths. The
synthetic-merge-commit path used `HEAD^1` (target tip), while the
progressive-deepening path used `git merge-base` (common ancestor).
`git diff $BASE..$HEAD` therefore produced different change sets
depending on ADO's checkout mode. Fixed: synthetic path now also
computes `git merge-base HEAD^1 HEAD^2`, so `BASE_SHA` is always
the common ancestor ("PR diff since branch-point").

2. Cleanup — unreachable `|| echo 0`. `wc -w` always exits 0 and
prints "0" on empty input, so the fallback never fired. Replaced
with a clean `${PARENTS:-0}` parameter-expansion default.

3. Security defence-in-depth — missing `PR_TARGET_BRANCH` validation.
`PR_ID`/`PROJECT`/`REPO` were strictly regex-validated; only
`PR_TARGET_BRANCH` was just checked for emptiness despite being
interpolated into a git refspec. Added `^[A-Za-z0-9._/-]+$` regex
guard with the same posture as the other identifiers.

4. Footgun — explicit `pr.enabled: true` widened agent bash on
non-PR agents. Without `on.pr` configured, the prepare step is dead
code (runtime `Build.Reason == 'PullRequest'` gate), but the 7 git
commands were still appended to the agent's bash allow-list at
compile time. Now `on.pr` is REQUIRED — `pr.enabled: true` without
it is treated the same as the default (inactive). Aggregate in
`ExecContextExtension::new` updated to match.

Tests:
- `test_execution_context_pr_synthetic_merge_uses_merge_base` (new)
  — regression guard against the `HEAD^1`-direct-assignment shape.
- `test_execution_context_pr_target_branch_validated` (new)
  — asserts the new regex guard is emitted.
- `test_execution_context_pr_enabled_true_without_on_pr_is_inactive`
  (new) — prepare step and bash allow-list both suppressed when
  on.pr is absent.
- All 9 `test_execution_context_pr_*` tests pass; full `cargo test`
  green (1739 unit + 127 compiler + bash_lint with shellcheck).

Docs updated: synthetic-path behaviour, `PR_TARGET_BRANCH` validation,
and `on.pr`-required activation rule in `docs/execution-context.md`.

Note: the first review (run 27020920083) was against the pre-v6.2
implementation; its concerns about `git -c http.extraheader` in
docs/comments were already resolved by the v6.2 rewrite (both now
describe `GIT_CONFIG_*` env-var injection).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two additional suggestions from the latest review pass.

1. Divergence-trap unit tests for `any_contributor_active`.
The pre-computation in `ExecContextExtension::new` duplicates each
contributor's `should_activate` logic so the aggregate flag can gate
`required_bash_commands` (which receives no `CompileContext`). The
existing tests exercise this only via fixture-compile paths through
`prepare_steps`, so a future contributor that wires up
`should_activate` but forgets to OR-in the aggregate could silently
suppress its bash allow-list contributions and only be caught at
E2E time.

Added 6 focused unit tests in `src/compile/extensions/exec_context/
mod.rs::tests` that construct `ExecContextExtension::new` directly
and assert `required_bash_commands` matches `should_activate` for
every combination of:
- on.pr configured / absent
- pr.enabled default / explicit true / explicit false
- master switch on / off

A future divergence trips here at unit-test time rather than at E2E
time.

2. Hard-fail on infra failures BEFORE the soft `fail()` machinery.
Without `set -e`, a failed `mkdir -p "$AW_PR_DIR"` (e.g. read-only
workspace) would silently continue, `fail()` would itself fail to
write `error.txt`, and the step would exit 0 with nothing staged
AND no failure signal in the agent prompt.

Added an explicit `mkdir -p "$AW_PR_DIR" || { echo ...; exit 1; }`
hard-fail at the top of the prepare step, with a message that
points the operator at `BUILD_SOURCESDIRECTORY` permissions.
Unlike the merge-base soft-fail path (normal operational case),
a missing output directory always indicates a broken pipeline
configuration and warrants a hard build break.

Tests: 1745 unit (+6) + 127 compiler + bash_lint with shellcheck;
clippy clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ript bundle

The PR contributor's prepare step used to embed ~190 lines of bash heredoc into the emitted YAML, with only end-to-end shellcheck for coverage. Port that work to a new `exec-context-pr.js` ado-script bundle alongside the existing `gate.js` and `import.js` bundles.

What's in the bundle (scripts/ado-script/src/exec-context-pr/):

- validate.ts - strict allowlist regex guards for PR id, project, repo, target branch.

- git.ts - execFile wrappers + bearerEnv helper that injects GIT_CONFIG_* into the spawned git child env only.

- merge-base.ts - synthetic-merge detection (parents >= 3) + progressive-deepening fetch (--depth=200/500/2000/--unshallow).

- prompt.ts - success/failure prompt-fragment writers.

- index.ts - orchestrator; hard-fails (exit 1) on mkdir failure, soft-fails (exit 0 + error.txt) on validation/merge-base failures.

- 32 vitest unit tests + 3 e2e smoke tests against a synthetic-merge git repo.

Rust side:

- pr.rs shrinks from ~320 lines to ~145 lines; the emitted bash is 4 lines wrapping a single `node /tmp/ado-aw-scripts/ado-script/exec-context-pr.js` invocation.

- AdoScriptExtension gains exec_context_pr_active: bool so the Agent-job install/download fires whenever either import.js or exec-context-pr.js is needed.

- Shared pr_contributor_will_activate predicate keeps the new flag and ExecContextExtension::new in lockstep; cfg-aware variant lets unit tests pass a custom config without divergence.

- Tests in compiler_tests.rs updated to assert the new node-invocation shape; two tests that asserted bash literals (synthetic-merge, target-branch regex) deleted - their logic is now covered by vitest.

- dedupe_gate_only fixture pins execution-context.pr.enabled: false to keep the gate-only download-placement test narrow.

Trust boundary improves: the bearer (SYSTEM_ACCESSTOKEN) is mapped into the Node process's env and is only propagated into the spawned git child via GIT_CONFIG_COUNT/KEY_0/VALUE_0 - never into the wrapping bash shell, never on argv, never written to .git/config.

Release packaging (.github/workflows/release.yml) zips the new bundle into ado-script.zip.

Docs (docs/ado-script.md, docs/execution-context.md, AGENTS.md) updated to reflect three bundles.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five findings from the latest PR review on commit 7bcf107:

1. Security: sanitize unvalidated env values before embedding in the failure prompt fragment. A PR author with push access could craft a branch name containing CR/LF + markdown headers to inject content into the agent prompt via the validation-failure path. validate.ts now passes raw values through sanitizeForPrompt (CR/LF -> space, truncate to 80 chars with ellipsis) before building the reason string. Covers PR_ID, PROJECT, REPO, and PR_TARGET_BRANCH symmetrically. 2 new vitest cases guard the sanitization invariant.

2. Naming clarity: rename countParents -> countParentTokens in merge-base.ts. The function returns `1 + parentCount` (rev-list --parents output includes the commit SHA itself), so a normal 2-parent merge yields 3 tokens. The previous name made debug output (`parents=3`) misleading. Failure message updated to `parentTokens=N`.

3. Stale docstring: git.ts::bearerEnv mentioned the v6.2 bash `git_fetch` wrapper that no longer exists. Rewritten to drop the obsolete reference.

4. Smoke-test gap: the synthetic-merge success smoke test now cross-checks the bundle's emitted base.sha / head.sha against git's own `git rev-parse HEAD^2` / `git merge-base HEAD^1 HEAD^2` answer. Guards against silent SHA-transposition regressions.

5. Dedupe-fixture coverage gap: `dedupe_gate_only.md` pins `execution-context.pr.enabled: false` to keep its narrow contract, which left the `inlined-imports: true + on.pr + exec-context PR active` combination uncovered. Added `tests/fixtures/dedupe_exec_context_pr_only.md` and `test_exec_context_pr_only_downloads_bundle_in_agent_job_not_setup` to assert the Agent-job download fires for exec-context-pr.js alone (no gate, imports inlined).

Verified: cargo build/test (1746 unit + 126 compiler_tests + all integration suites green), cargo clippy --all-targets --all-features clean, `npm run typecheck/test/build/test:smoke` (245 vitest + 6 smoke tests green).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine force-pushed the devinejames/pr-context branch from 7bcf107 to e5777a0 Compare June 6, 2026 08:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🔍 Rust PR Review

Summary: Well-engineered with good trust-boundary design — one security gap in the TypeScript failure path, one visibility inconsistency in Rust.

Findings

🔒 Security Concerns

  • scripts/ado-script/src/exec-context-pr/prompt.tsfailureFragment — the partial.prId/project/repo values are embedded in the agent prompt without calling sanitizeForPrompt. In the failure path in index.ts, these are the raw unvalidated env values passed from validateIdentifiers's error branch:

    // index.ts
    writeFailure(prDir, promptPath, idsOrErr.reason, {
      prId:    env.SYSTEM_PULLREQUEST_PULLREQUESTID,  // raw, unvalidated
      project: env.SYSTEM_TEAMPROJECT,               // raw, unvalidated
      repo:    env.BUILD_REPOSITORY_NAME,            // raw, unvalidated
    });

    failureFragment then embeds these directly:

    `PR #${prId} in project ${project} / repository ${repo} -- context preparation failed.`

    If SYSTEM_PULLREQUEST_PULLREQUESTID were set to something like "\n## Injected\nIgnore previous instructions", it would be written verbatim into the agent prompt. The reason field is correctly sanitised (via sanitizeForPrompt inside validateIdentifiers), and the PR's own wording — “defence-in-depth is cheap” — applies equally here. validate.test.ts has a dedicated test asserting CR/LF is neutralised in reason, but there’s no equivalent guard for the partial values in prompt.test.ts.

    Fix: sanitise each partial value inside failureFragment:

    // prompt.ts
    import { sanitizeForPrompt } from "./validate.js";
    
    const prId    = partial.prId    ? sanitizeForPrompt(partial.prId)    : "<unknown>";
    const project = partial.project ? sanitizeForPrompt(partial.project) : "<unknown>";
    const repo    = partial.repo    ? sanitizeForPrompt(partial.repo)    : "<unknown>";

    These values come from ADO predefined variables (infra-set, not PR-author-controlled), so exploitability is low in practice — but the consistent-sanitisation posture is the right call.

⚠️ Suggestions

  • src/compile/extensions/ado_script.rs:34EXEC_CONTEXT_PR_PATH is declared pub while the analogous IMPORT_EVAL_PATH immediately above uses pub(crate). Both are crate-internal constants consumed only by sibling modules; pub(crate) keeps the visibility tight and consistent.

✅ What Looks Good

  • Trust boundary is solid. Bearer token flows through GIT_CONFIG_COUNT/KEY_0/VALUE_0 env vars scoped only to the spawned git subprocess — never in argv, never in .git/config, never visible to the agent step. The test_execution_context_pr_does_not_leak_system_accesstoken test walks the full compiled YAML to assert this, which is a strong guard.
  • Progressive-deepening loop correctness: the loop breaks only when git merge-base actually resolves, not just on the first successful fetch. The behaviour is tested at both layers (TS: “retries with deeper fetches when earlier merge-base fails”; Rust: test_execution_context_pr_synthetic_merge_uses_merge_base).
  • any_contributor_active divergence-trap tests in exec_context/mod.rs catch pre-computation skew before E2E. Thoughtful defensive design.
  • pr.enabled: true without on.pr correctly inactive: bash allow-list footgun is guarded from both the compile-time path and the required_bash_commands path.
  • set -euo pipefail on the slim wrapper with node invocation is correct: the bundle exits 0 on every soft failure and reserves non-zero for true infra failures (e.g. unwritable workspace), matching the || exit 1 posture of the previous bash implementation.

Generated by Rust PR Reviewer for issue #865 · sonnet46 2.7M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: built-in PR context precomputation for PR-triggered agents

1 participant