Skip to content

Seed-aware reproducibility caveat: pin RNG via config + 3-state seed classification (#251 follow-up) #339

Description

@stanlrt

Issue #251 follow-up #3: seed-aware reproducibility caveat

Problem

The core of #251 ships an unconditional stochastic caveat: whenever a run
contains >= 1 stochastic method, the report banner, REPRODUCIBILITY.md, and the
CLI warning all fire, worded "results are not bit-reproducible unless seeds are
pinned." RAITAP captures no seed and pins no RNG, so the caveat cannot soften.

Two gaps follow:

  1. No way to pin randomness. A user cannot make a stochastic run reproducible
    through RAITAP; they must reach into library internals.
  2. All-or-nothing wording. Even when most methods would be reproducible under
    a pinned seed, the caveat names the whole set and says "not reproducible." The
    user cannot tell which methods are actually still non-reproducible.

This is the deferred point #3 of #251 ("if a global/per-method seed was set,
soften the wording").

Scope

Pin a run's RNG via config, and partition the stochastic methods so each surface
names only the methods that remain non-reproducible after seeding.

In scope:

  • A 3-state per-algorithm seed classification, replacing the stochastic: bool.
  • A full audit: every entry in every registry gets an explicit classification.
  • A top-level seed config that pins the process-global RNGs at run start and is
    recorded in run metadata.
  • Partition + reworded caveat on all three existing surfaces.

Out of scope (explicit non-goals, possible follow-ups):

  • Forwarding the seed into a method's own seed parameter (e.g. passing
    seed=42 into torchattacks AutoAttack). RAITAP's global seed cannot reach a
    self-seeding method; making those reproducible needs per-algorithm param
    plumbing (a tier-C effort, overlaps the registry work of Expand algorithm registries to full library coverage (torchattacks / foolbox / imagecorruptions) #266). Such methods
    stay warned.
  • A pure "silence the warning" opt-out flag. The seed-aware path delivers the UX
    honestly; a lie-switch is not added.
  • Per-method seed overrides distinct from the global seed.

Design

1. Classification: replace the boolean with a 3-state marker

The per-algorithm specs carry stochastic: bool today
(ExplainerAlgorithmSpec, transparency/contracts.py; AssessorAlgorithmSpec,
robustness/semantics.py). Replace it with:

Seeding = Literal["deterministic", "global_rng", "self_seeded"]
value meaning example
deterministic no RNG; bit-reproducible always Integrated Gradients, Saliency
global_rng draws from the process-global torch/numpy/random RNG PGD (random_start), SHAP Gradient
self_seeded owns its RNG / a seed param that time-defaults; global seed misses it torchattacks AutoAttack (seed=None)

The Seeding Literal is declared once in its owning module and imported
elsewhere (no redeclaration at the schema/semantics layers).

Back-compat shim. stochastic becomes a derived read-only property:
stochastic == (seeding != "deterministic"). The ~28-file consumer surface that
reads .stochastic (semantics flow, reproducibility.py, tests) keeps working
unchanged; only reproducibility.py's partition reads the new seeding value.

2. Full audit

Every algorithm in every registry receives an explicit seeding=, verified
against the installed library's default RNG behavior (same verification bar #251
used for the boolean):

  • transparency: SHAP, captum (incl. NoiseTunnel), tree/SHAP TreeExplainer
  • robustness: torchattacks, foolbox, imagecorruptions, auto_LiRPA, Marabou

Default for an unclassified/unknown entry is the conservative self_seeded
(always warns) so a missed audit never over-claims reproducibility. The audit's
job is to promote entries to their true state.

3. Flow onto results

seeding rides onto each result's semantics exactly as stochastic does today
(assessor_semantics / explainer_is_stochastic paths). The result object
carries the 3-state value so reproducibility.py reads it post-run, not from
live adapter hints.

4. Config: seed

New optional top-level field seed: int | None = None in the run config schema
(configs/schema.py). When set, the orchestrator pins the global RNGs once at
run start, before run_without_tracking:

  • torch.manual_seed(seed)
  • torch.cuda.manual_seed_all(seed) (guarded on CUDA availability)
  • numpy.random.seed(seed)
  • random.seed(seed)

The applied seed is recorded in the run's metadata.json. Seeding is a run-level
concern, independent of reporting_enabled (the run dir and console exist
regardless), matching how the existing caveat surfaces are gated.

Seeding logic lives in a small pure helper (e.g. pin_global_seed(seed)) so it is
unit-testable without a full run.

5. Partition + wording

reproducibility.py partitions the run's methods using seeding and whether
seed was set:

seeding seed set seed unset
deterministic silent silent
global_rng silent warned
self_seeded warned warned

stochastic_methods() (or a successor) returns the methods that land in a
warned cell, each tagged with why. reproducibility_caveat() words the result
around the partition. Sketches:

  • seed set, only global_rng present -> no caveat (run reproducible).
  • seed=42, mixed:

    Run seed=42. Reproducible under this seed: PGD, SHAP Gradient.
    NOT reproducible — these self-seed; pass each method's own seed param:
    AutoAttack.

  • seed unset, any stochastic present: current-style wording, but the named set is
    the union of global_rng + self_seeded (i.e. preserves today's behavior when
    no seed is given).
  • fully deterministic run: no caveat (unchanged).

Empty warned set on any path -> no banner, no REPRODUCIBILITY.md, no CLI warn.

6. Surfaces

Unchanged in number and location — report banner (reporting/builder.py
_reproducibility_banner), REPRODUCIBILITY.md (write_reproducibility_md), and
the CLI raitap_log.warn in the orchestrator. All read the new partition; the
REPRODUCIBILITY.md body lists the warned methods and, when a seed is set, the
reproducible-under-seed set for completeness.

7. Documentation

  • seed config -> user docs config reference + any user config walkthrough.
  • seeding field -> contributor docs: adding-a-backend.md registry section,
    architecture.md reproducibility note, src/raitap/models/README.md if it
    documents the spec fields.
  • Sweep docs/modules/**/*.md and src/**/README.md for any mention of the old
    stochastic boolean and the unconditional caveat wording.

Acceptance criteria

  • A run with seed set and only global_rng stochastic methods emits no
    caveat on any surface.
  • A run with seed set and a self_seeded method warns, naming that method and
    instructing the user to pass its own seed param; a co-present global_rng
    method is reported as reproducible, not warned.
  • A run with no seed set behaves as today: the caveat names every stochastic
    method.
  • A fully deterministic run emits no caveat.
  • Setting seed pins torch/numpy/random globals and records the seed in
    metadata.json.
  • Every registry entry across all adapters declares an explicit seeding;
    .stochastic still resolves correctly via the derived property.
  • Per-artefact partitioning is derived from semantics, not hard-coded in the
    reporting layer.

Tests

  • Unit: Seeding partition — each cell of the table above, including empty
    warned-set -> no surfaces.
  • Unit: pin_global_seed sets each RNG; metadata records the seed.
  • Per-adapter: every registry entry has a seeding; spot-check known values
    (PGD global_rng, AutoAttack self_seeded, IG deterministic).
  • Property back-compat: .stochastic equals seeding != "deterministic".
  • Orchestrator: seed set -> globals pinned + metadata written.
  • Report/CLI: wording for seed-set-reproducible, seed-set-mixed, seed-unset, and
    deterministic cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions