feat(compile): default executor to System.AccessToken and add always-on Azure CLI#873
feat(compile): default executor to System.AccessToken and add always-on Azure CLI#873jamesadevine wants to merge 3 commits into
Conversation
…on Azure CLI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good overall — clean implementation with solid test coverage. One medium concern about the always-on Findings
|
… longer crashes 1ES Reviewer-requested fix: static AWF bind-mounts for /opt/az and /usr/bin/az would break `docker run` on runners without azure-cli pre-installed (notably some 1ES self-hosted pools), failing the pipeline before the agent ever started. Replace the static mounts with a runtime detection prepare step that sets the ADO pipeline variable AW_AZ_MOUNTS via `##vso[task.setvariable]` when both /usr/bin/az and /opt/az exist on the host, or emits a `task.logissue` warning and leaves the variable unset otherwise. The AWF invocation in the compiled YAML now includes a single `$(AW_AZ_MOUNTS) \` line in the --mount chain. ADO interpolates the variable at step start: present -> the two --mount args appear; absent -> the line collapses to whitespace. No new trait method is added; only the existing `prepare_steps` hook is used. - AzureCliExtension: required_awf_mounts() now returns []; prepare_steps() emits the detection bash step - generate_awf_mounts: appends `$(AW_AZ_MOUNTS) \` when AzureCli is present - Tests: rewrite static-mount assertions to assert the detection step + the pipeline variable injection, plus a regression guard that no static az mount is emitted - Docs: docs/network.md and docs/tools.md updated with the runtime-detection design and operator implications Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Needs one fix — a silent breakage on runners without Findings🐛 Bugs / Logic Issues
|
Address two findings from the Rust PR Reviewer bot on PR #873. 1. CRITICAL — AW_AZ_MOUNTS undefined when az is missing: The runtime-detection step in AzureCliExtension only set the AW_AZ_MOUNTS pipeline variable in the detected branch. In the missing-az branch the variable was left undefined. ADO leaves an undefined $(VAR) as the LITERAL STRING "$(VAR)" in subsequent bash steps (it does NOT expand to empty). Bash sees $(AW_AZ_MOUNTS), interprets it as a $(...) command substitution, tries to execute a program named AW_AZ_MOUNTS, gets exit 127, and the AWF invocation step dies under `set -e` — the exact 1ES failure mode the refactor set out to prevent. Fix: always emit `##vso[task.setvariable variable=AW_AZ_MOUNTS]`, with an empty value in the missing branch. ADO then expands $(AW_AZ_MOUNTS) to nothing and the trailing `\` line becomes a harmless continuation no-op. Regression guards (both lock this in): - azure_cli.rs::test_azure_cli_prepare_steps_defines_aw_az_mounts_in_else_branch counts `setvariable` occurrences (must be 2) and asserts the else block contains an empty-value setvariable line. - compiler_tests.rs::test_default_pipeline_mounts_az_and_allows_azure_hosts asserts the same 2× count on the compiled lock.yml. 2. Cleanup — delete WRITE_REQUIRING_SAFE_OUTPUTS: The const was retained with #[allow(dead_code)] after the removal of `validate_write_permissions`, but its only consumers left were the two tests that exercised the const itself. Each `*Result` type already carries `REQUIRES_WRITE: bool` for any caller (compiler, audit, runtime) that needs the same info. Deleting the const removes a dead-code annotation and one otherwise-purposeless list to maintain when adding new tools. Test cleanup: removed `test_write_requiring_subset_of_all_known` (purely exercised the deleted const) and rewrote `test_all_known_completeness` to use a HashSet-based duplicate check on ALL_KNOWN_SAFE_OUTPUTS plus the ALWAYS_ON/NON_MCP disjointness check (preserves the meaningful invariants). Validation: - cargo build: clean - cargo test: 1748 unit + 119 compiler + all integration pass - cargo clippy --all-targets --all-features -- -D warnings: clean - tests/safe-outputs/azure-cli.lock.yml regenerated (+1 line) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Rust PR ReviewSummary: Looks good — both parts are well-reasoned and well-tested, with a couple of design-level suggestions. Findings
|
Summary
Two coupled Stage 3 / sandbox-infra changes:
Part A — Stage 3 executor defaults to
$(System.AccessToken)Previously the compiler hard-failed when a write-bearing safe output
(e.g.
create-pull-request,create-work-item,add-pr-comment) wasconfigured without
permissions.write. That stance was questionable —the pipeline's built-in OAuth token (
$(System.AccessToken), runningas Project Collection Build Service) is sufficient for the vast
majority of write operations, and requiring an ARM service connection
added significant onboarding friction.
generate_executor_ado_envnow always emits a non-emptyenv:blockfor the Stage 3 executor step:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)by defaultSYSTEM_ACCESSTOKEN: $(SC_WRITE_TOKEN)whenpermissions.writeisset (override for cross-org writes / named-identity attribution)
validate_write_permissionsdeleted along with its callsite — the"requires write SC" gate no longer exists.
WRITE_REQUIRING_SAFE_OUTPUTSremoved entirely (its only consumerswere
validate_write_permissionsand the tests that exercised it;the per-tool
REQUIRES_WRITEconst on each*Resulttype alreadyprovides the same information for any downstream tooling that needs
it). The remaining
test_all_known_completenesswas rewritten touse a HashSet-based duplicate check.
Trust boundary preserved: the agent (Stage 1) still never maps
SYSTEM_ACCESSTOKEN. Only the Setup-job filter gate and the Stage 3executor map it, both of which run outside the agent's container.
Part B — Always-on Azure CLI (
az) inside the AWF sandboxMirrors gh-aw's "assume
ghis on the runner" model.azispre-installed on Microsoft-hosted Ubuntu images and on most 1ES Ubuntu
images, so we expose it to the agent without an install step.
Architecture
AzureCliExtension(Tool phase, always-on):login.microsoftonline.com,login.windows.net,management.azure.com,graph.microsoft.com,aka.ms.azto every agent's bash allow-list.in the Agent job (NOT a separate Setup job).
Graceful runtime detection (addresses 1ES-breakage concern)
Static bind-mounts for
/opt/az+/usr/bin/azwould crashdocker runwith "bind source path does not exist" on runnerswithout
azure-clipre-installed (notably some 1ES self-hostedpools). The detection prepare step instead:
/usr/bin/az(launcher) and/opt/az(venv) on thehost.
##vso[task.setvariable variable=AW_AZ_MOUNTS]--mount /opt/az:/opt/az:ro --mount /usr/bin/az:/usr/bin/az:ro.##vso[task.setvariable variable=AW_AZ_MOUNTS]followed by
##vso[task.logissue type=warning]explainingazwon't be available inside the sandbox.
Setting the variable to empty string in the missing-
azbranch isload-bearing: ADO leaves an undefined
$(VAR)as the literal string$(VAR)in later bash steps, where bash would interpret it as a$(...)command substitution, try to execute a program namedAW_AZ_MOUNTS, get exit 127, and the AWF invocation step would dieunder
set -e— the opposite of graceful degradation. Defining thevariable as empty makes ADO expand it to nothing, leaving a harmless
\-continuation. Locked in by a 2× count regression test in bothazure_cli.rsunit tests andcompiler_tests.rs.generate_awf_mountsappends one$(AW_AZ_MOUNTS) \line to theAWF
--mountchain when the AzureCli extension is present.Operator outcomes
ubuntu-latest: detected → mounted →azworks inside the sandbox.
azure-clibaked into the image: same.azure-cli: pipeline runssuccessfully; warning surfaces in the ADO log; agents that invoke
azgetcommand not foundinside the sandbox (the standardfailure mode for a missing CLI). No more 1ES breakage.
Auth caveat (unchanged):
AZURE_DEVOPS_EXT_PAT(set whenpermissions.readis configured) authenticates onlyaz devopssubcommands. General
az/ ARM / Graph commands need separate auth.Live smoke pipeline
tests/safe-outputs/azure-cli.md— daily-scheduled Copilot pipelinethat runs
az --version+az devops project listand emits onenoop. Pattern matches existingtests/safe-outputs/*.mdsmokepipelines.
Docs
docs/network.md: Permissions section rewritten for new default;new "Always-on Azure CLI" section with the runtime-detection design
and operator implications;
aka.msadded to default-hosts table.docs/tools.md: new "Built-in CLIs" section covering bothaz(with runtime detection + host-posture table) and
gh.docs/template-markers.md:{{ acquire_write_token }}and{{ executor_ado_env }}entries refreshed.docs/front-matter.md:permissions.writereworded as optional.docs/safe-outputs.md: new "Executor authentication" subsection;removed "requires write service connection" caveat from
upload-build-attachment.docs/ado-aw-debug.md: exampleenv:block updated.prompts/create-ado-agentic-workflow.md,prompts/update-ado-agentic-workflow.md: softened"requires
permissions.write" wording.site/.../troubleshooting/common-issues.mdx: marked the old errormessage as historical.
site/.../setup/service-connections.mdx: replaced obsoletecompilation-error row with a cross-project 403 troubleshooting row.
Test plan
cargo build— clean.cargo test— 1748 unit + 119 compiler + all integration tests pass.cargo clippy --all-targets --all-features— clean.test_executor_step_uses_system_access_token_by_default—verifies the new default env block.
test_compile_succeeds_without_write_sc_for_write_requiring_safe_outputs— regression guard that the old hard-fail path is gone.
test_agent_job_steps_do_not_map_system_access_token— YAML-awareagent-job-scoped leak guard.
test_default_pipeline_mounts_az_and_allows_azure_hosts— assertsthe detection prepare step, the
$(AW_AZ_MOUNTS) \injection,Azure hosts in
--allow-domains, that NO static az mounts areemitted, that no install step is emitted, and that the
setvariableforAW_AZ_MOUNTSappears exactly twice (one perbranch of the if/else).
azure_cli.rsunit tests covering the extension shape, detectionconditional, pipeline-variable setvariable, empty-value
setvariable in the missing-
azbranch, warning on missing host,and
set -eo pipefaildiscipline.test_engine_args_bash_disabled— now permits always-onshell(az)while still rejecting generalshell(:*)/shell(*)/shell(bash).test_generate_awf_mounts_no_extensions— now expects the$(AW_AZ_MOUNTS) \injection line in the always-on baseline.collect_extensionscount assertions bumped +1.test_all_known_completenessrewritten around HashSet duplicatecheck (now that
WRITE_REQUIRING_SAFE_OUTPUTSis gone).test_permissions_validation_fails_without_write_sc,test_comment_on_work_item_requires_write_sc,test_write_requiring_subset_of_all_known.Lock-file regeneration
Only
tests/safe-outputs/azure-cli.lock.ymlis added by hand (it's anew fixture). The other 26
tests/safe-outputs/*.lock.ymlfiles willbe regenerated by the existing release-task workflow after this lands.