You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The core of #251 ships an unconditional stochastic caveat: whenever a run
contains >= 1 stochastic method, the report banner, REPRODUCIBILITY.md, and the
CLI warning all fire, worded "results are not bit-reproducible unless seeds are
pinned." RAITAP captures no seed and pins no RNG, so the caveat cannot soften.
Two gaps follow:
No way to pin randomness. A user cannot make a stochastic run reproducible
through RAITAP; they must reach into library internals.
All-or-nothing wording. Even when most methods would be reproducible under
a pinned seed, the caveat names the whole set and says "not reproducible." The
user cannot tell which methods are actually still non-reproducible.
This is the deferred point #3 of #251 ("if a global/per-method seed was set,
soften the wording").
Scope
Pin a run's RNG via config, and partition the stochastic methods so each surface
names only the methods that remain non-reproducible after seeding.
In scope:
A 3-state per-algorithm seed classification, replacing the stochastic: bool.
A full audit: every entry in every registry gets an explicit classification.
A top-level seed config that pins the process-global RNGs at run start and is
recorded in run metadata.
Partition + reworded caveat on all three existing surfaces.
Out of scope (explicit non-goals, possible follow-ups):
draws from the process-global torch/numpy/random RNG
PGD (random_start), SHAP Gradient
self_seeded
owns its RNG / a seed param that time-defaults; global seed misses it
torchattacks AutoAttack (seed=None)
The Seeding Literal is declared once in its owning module and imported
elsewhere (no redeclaration at the schema/semantics layers).
Back-compat shim.stochastic becomes a derived read-only property: stochastic == (seeding != "deterministic"). The ~28-file consumer surface that
reads .stochastic (semantics flow, reproducibility.py, tests) keeps working
unchanged; only reproducibility.py's partition reads the new seeding value.
2. Full audit
Every algorithm in every registry receives an explicit seeding=, verified
against the installed library's default RNG behavior (same verification bar #251
used for the boolean):
Default for an unclassified/unknown entry is the conservativeself_seeded
(always warns) so a missed audit never over-claims reproducibility. The audit's
job is to promote entries to their true state.
3. Flow onto results
seeding rides onto each result's semantics exactly as stochastic does today
(assessor_semantics / explainer_is_stochastic paths). The result object
carries the 3-state value so reproducibility.py reads it post-run, not from
live adapter hints.
4. Config: seed
New optional top-level field seed: int | None = None in the run config schema
(configs/schema.py). When set, the orchestrator pins the global RNGs once at
run start, before run_without_tracking:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed) (guarded on CUDA availability)
numpy.random.seed(seed)
random.seed(seed)
The applied seed is recorded in the run's metadata.json. Seeding is a run-level
concern, independent of reporting_enabled (the run dir and console exist
regardless), matching how the existing caveat surfaces are gated.
Seeding logic lives in a small pure helper (e.g. pin_global_seed(seed)) so it is
unit-testable without a full run.
5. Partition + wording
reproducibility.py partitions the run's methods using seeding and whether seed was set:
seeding
seed set
seed unset
deterministic
silent
silent
global_rng
silent
warned
self_seeded
warned
warned
stochastic_methods() (or a successor) returns the methods that land in a warned cell, each tagged with why. reproducibility_caveat() words the result
around the partition. Sketches:
seed set, only global_rng present -> no caveat (run reproducible).
seed=42, mixed:
Run seed=42. Reproducible under this seed: PGD, SHAP Gradient.
NOT reproducible — these self-seed; pass each method's own seed param:
AutoAttack.
seed unset, any stochastic present: current-style wording, but the named set is
the union of global_rng + self_seeded (i.e. preserves today's behavior when
no seed is given).
fully deterministic run: no caveat (unchanged).
Empty warned set on any path -> no banner, no REPRODUCIBILITY.md, no CLI warn.
6. Surfaces
Unchanged in number and location — report banner (reporting/builder.py _reproducibility_banner), REPRODUCIBILITY.md (write_reproducibility_md), and
the CLI raitap_log.warn in the orchestrator. All read the new partition; the REPRODUCIBILITY.md body lists the warned methods and, when a seed is set, the
reproducible-under-seed set for completeness.
7. Documentation
seed config -> user docs config reference + any user config walkthrough.
seeding field -> contributor docs: adding-a-backend.md registry section, architecture.md reproducibility note, src/raitap/models/README.md if it
documents the spec fields.
Sweep docs/modules/**/*.md and src/**/README.md for any mention of the old stochastic boolean and the unconditional caveat wording.
Acceptance criteria
A run with seed set and only global_rng stochastic methods emits no
caveat on any surface.
A run with seed set and a self_seeded method warns, naming that method and
instructing the user to pass its own seed param; a co-present global_rng
method is reported as reproducible, not warned.
A run with no seed set behaves as today: the caveat names every stochastic
method.
A fully deterministic run emits no caveat.
Setting seed pins torch/numpy/random globals and records the seed in metadata.json.
Every registry entry across all adapters declares an explicit seeding; .stochastic still resolves correctly via the derived property.
Per-artefact partitioning is derived from semantics, not hard-coded in the
reporting layer.
Tests
Unit: Seeding partition — each cell of the table above, including empty
warned-set -> no surfaces.
Unit: pin_global_seed sets each RNG; metadata records the seed.
Per-adapter: every registry entry has a seeding; spot-check known values
(PGD global_rng, AutoAttack self_seeded, IG deterministic).
Issue #251 follow-up #3: seed-aware reproducibility caveat
Problem
The core of #251 ships an unconditional stochastic caveat: whenever a run
contains >= 1 stochastic method, the report banner,
REPRODUCIBILITY.md, and theCLI warning all fire, worded "results are not bit-reproducible unless seeds are
pinned." RAITAP captures no seed and pins no RNG, so the caveat cannot soften.
Two gaps follow:
through RAITAP; they must reach into library internals.
a pinned seed, the caveat names the whole set and says "not reproducible." The
user cannot tell which methods are actually still non-reproducible.
This is the deferred point #3 of #251 ("if a global/per-method seed was set,
soften the wording").
Scope
Pin a run's RNG via config, and partition the stochastic methods so each surface
names only the methods that remain non-reproducible after seeding.
In scope:
stochastic: bool.seedconfig that pins the process-global RNGs at run start and isrecorded in run metadata.
Out of scope (explicit non-goals, possible follow-ups):
seed=42into torchattacksAutoAttack). RAITAP's global seed cannot reach aself-seeding method; making those reproducible needs per-algorithm param
plumbing (a tier-C effort, overlaps the registry work of Expand algorithm registries to full library coverage (torchattacks / foolbox / imagecorruptions) #266). Such methods
stay warned.
honestly; a lie-switch is not added.
Design
1. Classification: replace the boolean with a 3-state marker
The per-algorithm specs carry
stochastic: booltoday(
ExplainerAlgorithmSpec,transparency/contracts.py;AssessorAlgorithmSpec,robustness/semantics.py). Replace it with:deterministicglobal_rngself_seededseed=None)The
SeedingLiteral is declared once in its owning module and importedelsewhere (no redeclaration at the schema/semantics layers).
Back-compat shim.
stochasticbecomes a derived read-only property:stochastic == (seeding != "deterministic"). The ~28-file consumer surface thatreads
.stochastic(semantics flow,reproducibility.py, tests) keeps workingunchanged; only
reproducibility.py's partition reads the newseedingvalue.2. Full audit
Every algorithm in every registry receives an explicit
seeding=, verifiedagainst the installed library's default RNG behavior (same verification bar #251
used for the boolean):
Default for an unclassified/unknown entry is the conservative
self_seeded(always warns) so a missed audit never over-claims reproducibility. The audit's
job is to promote entries to their true state.
3. Flow onto results
seedingrides onto each result'ssemanticsexactly asstochasticdoes today(
assessor_semantics/explainer_is_stochasticpaths). The result objectcarries the 3-state value so
reproducibility.pyreads it post-run, not fromlive adapter hints.
4. Config:
seedNew optional top-level field
seed: int | None = Nonein the run config schema(
configs/schema.py). When set, the orchestrator pins the global RNGs once atrun start, before
run_without_tracking:torch.manual_seed(seed)torch.cuda.manual_seed_all(seed)(guarded on CUDA availability)numpy.random.seed(seed)random.seed(seed)The applied seed is recorded in the run's
metadata.json. Seeding is a run-levelconcern, independent of
reporting_enabled(the run dir and console existregardless), matching how the existing caveat surfaces are gated.
Seeding logic lives in a small pure helper (e.g.
pin_global_seed(seed)) so it isunit-testable without a full run.
5. Partition + wording
reproducibility.pypartitions the run's methods usingseedingand whetherseedwas set:seedingdeterministicglobal_rngself_seededstochastic_methods()(or a successor) returns the methods that land in awarned cell, each tagged with why.
reproducibility_caveat()words the resultaround the partition. Sketches:
global_rngpresent -> no caveat (run reproducible).the union of
global_rng+self_seeded(i.e. preserves today's behavior whenno seed is given).
Empty warned set on any path -> no banner, no
REPRODUCIBILITY.md, no CLI warn.6. Surfaces
Unchanged in number and location — report banner (
reporting/builder.py_reproducibility_banner),REPRODUCIBILITY.md(write_reproducibility_md), andthe CLI
raitap_log.warnin the orchestrator. All read the new partition; theREPRODUCIBILITY.mdbody lists the warned methods and, when a seed is set, thereproducible-under-seed set for completeness.
7. Documentation
seedconfig -> user docs config reference + any user config walkthrough.seedingfield -> contributor docs:adding-a-backend.mdregistry section,architecture.mdreproducibility note,src/raitap/models/README.mdif itdocuments the spec fields.
docs/modules/**/*.mdandsrc/**/README.mdfor any mention of the oldstochasticboolean and the unconditional caveat wording.Acceptance criteria
seedset and onlyglobal_rngstochastic methods emits nocaveat on any surface.
seedset and aself_seededmethod warns, naming that method andinstructing the user to pass its own seed param; a co-present
global_rngmethod is reported as reproducible, not warned.
seedset behaves as today: the caveat names every stochasticmethod.
seedpins torch/numpy/random globals and records the seed inmetadata.json.seeding;.stochasticstill resolves correctly via the derived property.reporting layer.
Tests
Seedingpartition — each cell of the table above, including emptywarned-set -> no surfaces.
pin_global_seedsets each RNG; metadata records the seed.seeding; spot-check known values(PGD
global_rng, AutoAttackself_seeded, IGdeterministic)..stochasticequalsseeding != "deterministic".seedset -> globals pinned + metadata written.deterministic cases.