feat(compile): default executor to System.AccessToken and add always-on Azure CLI by jamesadevine · Pull Request #873 · githubnext/ado-aw

jamesadevine · 2026-06-05T21:00:30Z

Summary

Two coupled Stage 3 / sandbox-infra changes:

Part A — Stage 3 executor defaults to `$(System.AccessToken)`

Previously the compiler hard-failed when a write-bearing safe output
(e.g. create-pull-request, create-work-item, add-pr-comment) was
configured without permissions.write. That stance was questionable —
the pipeline's built-in OAuth token ($(System.AccessToken), running
as Project Collection Build Service) is sufficient for the vast
majority of write operations, and requiring an ARM service connection
added significant onboarding friction.

generate_executor_ado_env now always emits a non-empty env: block
for the Stage 3 executor step:
- SYSTEM_ACCESSTOKEN: $(System.AccessToken) by default
- SYSTEM_ACCESSTOKEN: $(SC_WRITE_TOKEN) when permissions.write is
  set (override for cross-org writes / named-identity attribution)
validate_write_permissions deleted along with its callsite — the
"requires write SC" gate no longer exists.
WRITE_REQUIRING_SAFE_OUTPUTS removed entirely (its only consumers
were validate_write_permissions and the tests that exercised it;
the per-tool REQUIRES_WRITE const on each *Result type already
provides the same information for any downstream tooling that needs
it). The remaining test_all_known_completeness was rewritten to
use a HashSet-based duplicate check.

Trust boundary preserved: the agent (Stage 1) still never maps
SYSTEM_ACCESSTOKEN. Only the Setup-job filter gate and the Stage 3
executor map it, both of which run outside the agent's container.

Part B — Always-on Azure CLI (`az`) inside the AWF sandbox

Mirrors gh-aw's "assume gh is on the runner" model. az is
pre-installed on Microsoft-hosted Ubuntu images and on most 1ES Ubuntu
images, so we expose it to the agent without an install step.

Architecture

New AzureCliExtension (Tool phase, always-on):
- Allow-lists 5 Azure auth/management hosts:
  login.microsoftonline.com, login.windows.net,
  management.azure.com, graph.microsoft.com, aka.ms.
- Adds az to every agent's bash allow-list.
- Emits a "Detect Azure CLI on host (for AWF mount)" prepare step
  in the Agent job (NOT a separate Setup job).
- No install step. No static AWF mounts. No new trait method.

Graceful runtime detection (addresses 1ES-breakage concern)

Static bind-mounts for /opt/az + /usr/bin/az would crash
docker run with "bind source path does not exist" on runners
without azure-cli pre-installed (notably some 1ES self-hosted
pools). The detection prepare step instead:

Tests for both /usr/bin/az (launcher) and /opt/az (venv) on the
host.
Present → emits
##vso[task.setvariable variable=AW_AZ_MOUNTS]--mount /opt/az:/opt/az:ro --mount /usr/bin/az:/usr/bin/az:ro.
Missing → emits an empty-value ##vso[task.setvariable variable=AW_AZ_MOUNTS]
followed by ##vso[task.logissue type=warning] explaining az
won't be available inside the sandbox.

Setting the variable to empty string in the missing-az branch is
load-bearing: ADO leaves an undefined $(VAR) as the literal string
$(VAR) in later bash steps, where bash would interpret it as a
$(...) command substitution, try to execute a program named
AW_AZ_MOUNTS, get exit 127, and the AWF invocation step would die
under set -e — the opposite of graceful degradation. Defining the
variable as empty makes ADO expand it to nothing, leaving a harmless
\-continuation. Locked in by a 2× count regression test in both
azure_cli.rs unit tests and compiler_tests.rs.

generate_awf_mounts appends one $(AW_AZ_MOUNTS) \ line to the
AWF --mount chain when the AzureCli extension is present.

Operator outcomes

Microsoft-hosted ubuntu-latest: detected → mounted → az
works inside the sandbox.
1ES self-hosted with azure-cli baked into the image: same.
1ES self-hosted without azure-cli: pipeline runs
successfully; warning surfaces in the ADO log; agents that invoke
az get command not found inside the sandbox (the standard
failure mode for a missing CLI). No more 1ES breakage.

Auth caveat (unchanged): AZURE_DEVOPS_EXT_PAT (set when
permissions.read is configured) authenticates only az devops
subcommands. General az / ARM / Graph commands need separate auth.

Live smoke pipeline

tests/safe-outputs/azure-cli.md — daily-scheduled Copilot pipeline
that runs az --version + az devops project list and emits one
noop. Pattern matches existing tests/safe-outputs/*.md smoke
pipelines.

Docs

docs/network.md: Permissions section rewritten for new default;
new "Always-on Azure CLI" section with the runtime-detection design
and operator implications; aka.ms added to default-hosts table.
docs/tools.md: new "Built-in CLIs" section covering both az
(with runtime detection + host-posture table) and gh.
docs/template-markers.md: {{ acquire_write_token }} and
{{ executor_ado_env }} entries refreshed.
docs/front-matter.md: permissions.write reworded as optional.
docs/safe-outputs.md: new "Executor authentication" subsection;
removed "requires write service connection" caveat from
upload-build-attachment.
docs/ado-aw-debug.md: example env: block updated.
prompts/create-ado-agentic-workflow.md,
prompts/update-ado-agentic-workflow.md: softened
"requires permissions.write" wording.
site/.../troubleshooting/common-issues.mdx: marked the old error
message as historical.
site/.../setup/service-connections.mdx: replaced obsolete
compilation-error row with a cross-project 403 troubleshooting row.

Test plan

cargo build — clean.
cargo test — 1748 unit + 119 compiler + all integration tests pass.
cargo clippy --all-targets --all-features — clean.
New tests added (Part A):
- test_executor_step_uses_system_access_token_by_default —
  verifies the new default env block.
- test_compile_succeeds_without_write_sc_for_write_requiring_safe_outputs
  — regression guard that the old hard-fail path is gone.
- test_agent_job_steps_do_not_map_system_access_token — YAML-aware
  agent-job-scoped leak guard.
New tests added (Part B):
- test_default_pipeline_mounts_az_and_allows_azure_hosts — asserts
  the detection prepare step, the $(AW_AZ_MOUNTS) \ injection,
  Azure hosts in --allow-domains, that NO static az mounts are
  emitted, that no install step is emitted, and that the
  setvariable for AW_AZ_MOUNTS appears exactly twice (one per
  branch of the if/else).
- azure_cli.rs unit tests covering the extension shape, detection
  conditional, pipeline-variable setvariable, empty-value
  setvariable in the missing-az branch, warning on missing host,
  and set -eo pipefail discipline.
Updated tests:
- test_engine_args_bash_disabled — now permits always-on
  shell(az) while still rejecting general shell(:*)/shell(*)/shell(bash).
- test_generate_awf_mounts_no_extensions — now expects the
  $(AW_AZ_MOUNTS) \ injection line in the always-on baseline.
- All collect_extensions count assertions bumped +1.
- test_all_known_completeness rewritten around HashSet duplicate
  check (now that WRITE_REQUIRING_SAFE_OUTPUTS is gone).
Deleted tests:
test_permissions_validation_fails_without_write_sc,
test_comment_on_work_item_requires_write_sc,
test_write_requiring_subset_of_all_known.

Lock-file regeneration

Only tests/safe-outputs/azure-cli.lock.yml is added by hand (it's a
new fixture). The other 26 tests/safe-outputs/*.lock.yml files will
be regenerated by the existing release-task workflow after this lands.

…on Azure CLI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-05T21:07:57Z

🔍 Rust PR Review

Summary: Looks good overall — clean implementation with solid test coverage. One medium concern about the always-on AzureCliExtension breaking non-Microsoft-hosted runners, and a few minor nits.

Findings

⚠️ Suggestions

src/compile/extensions/azure_cli.rs — no opt-out for always-on mount
The AzureCliExtension unconditionally mounts /opt/az and /usr/bin/az into every AWF container. If the host doesn't have azure-cli installed, AWF will fail to start the Agent container (bind-mount error), making the entire pipeline hard-fail rather than gracefully skip az availability. This is a breaking change for 1ES self-hosted pools that don't bake in azure-cli, and there's no front-matter opt-out (e.g. tools: az: false). The docs call this out but don't provide an escape hatch. Given gh-aw's gh also has no opt-out, this may be an accepted design choice — but worth a deliberate call.
src/safeoutputs/mod.rs:30 — WRITE_REQUIRING_SAFE_OUTPUTS is now unreachable in-crate
#[allow(dead_code)] was added because validate_write_permissions (its only in-crate consumer) was deleted. If nothing in the crate (including audit/) reads this constant, the annotation is just noise and the constant could be removed. If the audit analyzers do read it, remove the attribute. A quick grep WRITE_REQUIRING_SAFE_OUTPUTS src/ would clarify.
tests/safe-outputs/azure-cli.md:11 — gpt-5-mini is not a real model name
The smoke test specifies model: gpt-5-mini. This will error at runtime when the pipeline actually queues a run — the Copilot or OpenAI API will reject an unknown model. Likely should be gpt-4o-mini or o4-mini.
PR description vs. code: login.microsoft.com is listed in the summary but absent from the allow-list
The PR description enumerates 5 hosts including login.microsoft.com, but azure_cli.rs correctly has aka.ms as the 5th instead. login.microsoft.com is the consumer MSA sign-in surface and not used by enterprise Azure CLI auth — so the code is right — but the description will cause confusion for anyone cross-checking. Minor; description-only fix.

✅ What Looks Good

test_agent_job_steps_do_not_map_system_access_token — excellent defense-in-depth. Parses the compiled YAML, scopes the check to the Agent job only, and precisely avoids false-positives from the Setup-job gate and Stage 3 executor. Exactly the right way to guard the cross-stage trust boundary.
AWF mounts are AwfMountMode::ReadOnly — the agent has no business writing to az's install; this is enforced at mount level and tested.
validate_write_permissions removal is clean: the regression guard (test_compile_succeeds_without_write_sc_for_write_requiring_safe_outputs) checks the positive path (compile succeeds + $(System.AccessToken) is present) and the absence of $(SC_WRITE_TOKEN) in the same pass.
test_engine_args_bash_disabled update is correct: the old !params.contains("shell(") would have fired on the always-on shell(az) injection. The new form narrows to the actually-forbidden forms (shell(:*), shell(*), shell(bash)) while asserting the expected shell(az) is present — a faithful expression of the policy.
No install step is the right call. Matches gh-aw's model and avoids minutes of container-time overhead per run.

Generated by Rust PR Reviewer for issue #873 · sonnet46 1.5M · ◷

… longer crashes 1ES Reviewer-requested fix: static AWF bind-mounts for /opt/az and /usr/bin/az would break `docker run` on runners without azure-cli pre-installed (notably some 1ES self-hosted pools), failing the pipeline before the agent ever started. Replace the static mounts with a runtime detection prepare step that sets the ADO pipeline variable AW_AZ_MOUNTS via `##vso[task.setvariable]` when both /usr/bin/az and /opt/az exist on the host, or emits a `task.logissue` warning and leaves the variable unset otherwise. The AWF invocation in the compiled YAML now includes a single `$(AW_AZ_MOUNTS) \` line in the --mount chain. ADO interpolates the variable at step start: present -> the two --mount args appear; absent -> the line collapses to whitespace. No new trait method is added; only the existing `prepare_steps` hook is used. - AzureCliExtension: required_awf_mounts() now returns []; prepare_steps() emits the detection bash step - generate_awf_mounts: appends `$(AW_AZ_MOUNTS) \` when AzureCli is present - Tests: rewrite static-mount assertions to assert the detection step + the pipeline variable injection, plus a regression guard that no static az mount is emitted - Docs: docs/network.md and docs/tools.md updated with the runtime-detection design and operator implications Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-05T21:38:00Z

🔍 Rust PR Review

Summary: Needs one fix — a silent breakage on runners without az installed that defeats the PR's primary graceful-degradation goal.

Findings

🐛 Bugs / Logic Issues

src/compile/extensions/azure_cli.rs:119-130 — AW_AZ_MOUNTS never set in the else branch

The detection step only calls ##vso[task.setvariable variable=AW_AZ_MOUNTS] in the if branch (az present). When az is not detected, the variable is simply never defined in ADO's pipeline variable store.

In ADO, an undefined $(VARNAME) in a subsequent bash step is not expanded to empty string — it is left as the literal string $(VARNAME). The AWF invocation step already has set -eo pipefail active ("pipefail (set above)" comment confirms this). Bash sees $(AW_AZ_MOUNTS) as a command substitution, tries to execute a program named AW_AZ_MOUNTS, gets exit 127, and set -e kills the step.

The PR description says "absent → the line collapses to whitespace + a \ continuation, which is a no-op" — but this only holds if the variable is explicitly set to empty string. An undefined variable does not collapse; it leaves the literal $(AW_AZ_MOUNTS) for bash to evaluate.

This means every compiled pipeline running on a 1ES self-hosted pool without azure-cli pre-installed would fail at the AWF invocation step — the opposite of the "graceful degradation" the PR promises.

Fix — add one line to the else branch in prepare_steps:
```
let step = r###"- bash: |
    set -eo pipefail
    if [ -f /usr/bin/az ] && [ -d /opt/az ]; then
      echo "##vso[task.setvariable variable=AW_AZ_MOUNTS]--mount /opt/az:/opt/az:ro --mount /usr/bin/az:/usr/bin/az:ro"
      echo "Azure CLI detected on host; mounting /opt/az and /usr/bin/az into AWF sandbox."
    else
      echo "##vso[task.setvariable variable=AW_AZ_MOUNTS]"
      echo "##vso[task.logissue type=warning]Azure CLI not detected ..."
    fi
  displayName: "Detect Azure CLI on host (for AWF mount)"
"###;
```
With AW_AZ_MOUNTS always defined (even as empty string), ADO expands $(AW_AZ_MOUNTS) to nothing and the \ continuation line becomes a harmless no-op exactly as described.

The existing test test_azure_cli_prepare_steps_warns_when_az_missing should also assert setvariable is emitted in the else branch to lock this in.

⚠️ Suggestions

src/compile/extensions/azure_cli.rs:69 — login.microsoft.com absent from required_hosts()

The PR description lists it ("Allow-lists 5 Azure auth hosts: ... login.microsoft.com ...") but the code has only these 5: login.microsoftonline.com, login.windows.net, management.azure.com, graph.microsoft.com, aka.ms. If you intentionally excluded login.microsoft.com (personal MSA accounts, rarely needed for enterprise az usage), the description should drop it.

✅ What Looks Good

Trust boundary test is excellent: test_agent_job_steps_do_not_map_system_access_token parses the compiled YAML, scopes to the Agent job only, and asserts SYSTEM_ACCESSTOKEN is absent from every step's env: — much stronger than a text search that would false-positive on the legitimate executor and filter-gate mappings.
Removing validate_write_permissions is correct: the built-in token is standard ADO practice; forcing an ARM SC was unnecessary friction. The regression test that flips the old hard-fail to a success assertion is clean.
Graceful detection approach: avoiding static bind-mounts in required_awf_mounts() is the right call — the static-mount crash on 1ES pools is a real failure mode.
Test coverage is thorough across both parts; the smoke pipeline azure-cli.md provides end-to-end validation on real infra.

Generated by Rust PR Reviewer for issue #873 · sonnet46 2.4M · ◷

Address two findings from the Rust PR Reviewer bot on PR #873. 1. CRITICAL — AW_AZ_MOUNTS undefined when az is missing: The runtime-detection step in AzureCliExtension only set the AW_AZ_MOUNTS pipeline variable in the detected branch. In the missing-az branch the variable was left undefined. ADO leaves an undefined $(VAR) as the LITERAL STRING "$(VAR)" in subsequent bash steps (it does NOT expand to empty). Bash sees $(AW_AZ_MOUNTS), interprets it as a $(...) command substitution, tries to execute a program named AW_AZ_MOUNTS, gets exit 127, and the AWF invocation step dies under `set -e` — the exact 1ES failure mode the refactor set out to prevent. Fix: always emit `##vso[task.setvariable variable=AW_AZ_MOUNTS]`, with an empty value in the missing branch. ADO then expands $(AW_AZ_MOUNTS) to nothing and the trailing `\` line becomes a harmless continuation no-op. Regression guards (both lock this in): - azure_cli.rs::test_azure_cli_prepare_steps_defines_aw_az_mounts_in_else_branch counts `setvariable` occurrences (must be 2) and asserts the else block contains an empty-value setvariable line. - compiler_tests.rs::test_default_pipeline_mounts_az_and_allows_azure_hosts asserts the same 2× count on the compiled lock.yml. 2. Cleanup — delete WRITE_REQUIRING_SAFE_OUTPUTS: The const was retained with #[allow(dead_code)] after the removal of `validate_write_permissions`, but its only consumers left were the two tests that exercised the const itself. Each `*Result` type already carries `REQUIRES_WRITE: bool` for any caller (compiler, audit, runtime) that needs the same info. Deleting the const removes a dead-code annotation and one otherwise-purposeless list to maintain when adding new tools. Test cleanup: removed `test_write_requiring_subset_of_all_known` (purely exercised the deleted const) and rewrote `test_all_known_completeness` to use a HashSet-based duplicate check on ALL_KNOWN_SAFE_OUTPUTS plus the ALWAYS_ON/NON_MCP disjointness check (preserves the meaningful invariants). Validation: - cargo build: clean - cargo test: 1748 unit + 119 compiler + all integration pass - cargo clippy --all-targets --all-features -- -D warnings: clean - tests/safe-outputs/azure-cli.lock.yml regenerated (+1 line) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-06T08:23:58Z

🔍 Rust PR Review

Summary: Looks good — both parts are well-reasoned and well-tested, with a couple of design-level suggestions.

Findings

⚠️ Suggestions

generate_awf_mounts tight coupling to AzureCliExtension variant
[src/compile/common.rs, generate_awf_mounts]

The inject_az_var check names a concrete enum variant inside an otherwise fully trait-generic function:

let inject_az_var = extensions
    .iter()
    .any(|ext| matches!(ext, super::extensions::Extension::AzureCli(_)));

This is the only place in common.rs that names a specific extension variant. If AzureCli is renamed, or a second extension ever needs the same pipeline-variable-mount pattern, it'll need updating in two places.

A fn awf_pipeline_var_mount(&self) -> Option<&str> method on CompilerExtension (returning Some("$(AW_AZ_MOUNTS) \\") for AzureCliExtension, None for all others) would keep generate_awf_mounts fully generic. Structural preference — the current code is correct.

test_all_known_completeness loses exhaustiveness guarantee
[src/safeoutputs/mod.rs]

The old test enforced ALL_KNOWN = WRITE_REQUIRING + ALWAYS_ON + NON_MCP via a count assertion. The new test only checks for duplicates and ALWAYS_ON/NON_MCP disjointness. With WRITE_REQUIRING_SAFE_OUTPUTS gone, there's no test that catches a new write tool (added via tool_result! { write = true, ... }) that a developer forgets to include in ALL_KNOWN_SAFE_OUTPUTS. The duplicate check catches over-inclusion; nothing catches under-inclusion.

A missing tool produces silent routing failures at runtime. A minimal guard would help:

// Bump this constant when adding new tools.
const MIN_KNOWN_COUNT: usize = 26; // count as of this PR
assert!(
    ALL_KNOWN_SAFE_OUTPUTS.len() >= MIN_KNOWN_COUNT,
    "ALL_KNOWN_SAFE_OUTPUTS shrank — did you forget to register a new tool? (got {})",
    ALL_KNOWN_SAFE_OUTPUTS.len()
);

🔒 Security Concerns

Unconditional Azure endpoint allowlist expansion
[src/compile/extensions/azure_cli.rs, required_hosts]

login.microsoftonline.com, login.windows.net, management.azure.com, graph.microsoft.com, and aka.ms are injected into every compiled pipeline's --allow-domains list — no opt-out. The PR rationale ("inert when az isn't present") is sound for availability, but from a security posture standpoint this expands the prompt-injection blast radius: a compromised agent in any pipeline can now attempt to reach Azure AD / ARM / Graph regardless of whether the operator intended to use az.

Not a blocker — the L7 proxy is the enforcement point and the docs/network.md update acknowledges this — but worth making explicit that this is an unconditional policy change for all pipelines, not just those that use az.

✅ What Looks Good

Both-branch setvariable regression test: locking in the 2× occurrence count for ##vso[task.setvariable variable=AW_AZ_MOUNTS] (in both azure_cli.rs unit tests and compiler_tests.rs) is exactly right. That's the critical invariant for graceful degradation and it's well-guarded.
YAML-aware, agent-job-scoped trust boundary test (test_agent_job_steps_do_not_map_system_access_token): parsing the compiled YAML and walking only the Agent job's env blocks avoids false-positives from the Setup-gate and Stage 3 legitimate mappings. Clean approach.
Empty static mount list with explanatory doc comment: the required_awf_mounts() no-op body and its detailed comment on why it must stay empty is the right call — it will prevent well-intentioned future regressions.
test_compile_succeeds_without_write_sc_for_write_requiring_safe_outputs: validates the actual compiled YAML content (not just exit code), so the default $(System.AccessToken) mapping is confirmed end-to-end.

Generated by Rust PR Reviewer for issue #873 · sonnet46 3.1M · ◷

feat(compile): default executor to System.AccessToken and add always-…

1b641ee

…on Azure CLI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compile): default executor to System.AccessToken and add always-on Azure CLI#873

feat(compile): default executor to System.AccessToken and add always-on Azure CLI#873
jamesadevine wants to merge 3 commits into
mainfrom
devinejames/wholesale-access-token

jamesadevine commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesadevine commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Part A — Stage 3 executor defaults to $(System.AccessToken)

Part B — Always-on Azure CLI (az) inside the AWF sandbox

Architecture

Graceful runtime detection (addresses 1ES-breakage concern)

Operator outcomes

Live smoke pipeline

Docs

Test plan

Lock-file regeneration

Uh oh!

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Findings

⚠️ Suggestions

✅ What Looks Good

Uh oh!

github-actions Bot commented Jun 5, 2026

🔍 Rust PR Review

Findings

🐛 Bugs / Logic Issues

⚠️ Suggestions

✅ What Looks Good

Uh oh!

github-actions Bot commented Jun 6, 2026

🔍 Rust PR Review

Findings

⚠️ Suggestions

🔒 Security Concerns

✅ What Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamesadevine commented Jun 5, 2026 •

edited

Loading

Part A — Stage 3 executor defaults to `$(System.AccessToken)`

Part B — Always-on Azure CLI (`az`) inside the AWF sandbox