This repository contains the model-agnostic reference pieces for the paper "Inductive Biases for Machine-Discoverable Concepts":
- Discoverability Composite Score (DCS) normalization and retention rules.
- Locality/stability, compositional workspace, and redundancy-reduction losses.
- A lightweight auxiliary concept interface over frozen hidden states.
- Metadata-free test-time candidate selection.
- A CSV CLI for scoring candidate-level measurements.
It is not a full reproduction package for the paper's Qwen3-VL, Phi, and Gemma experiments. Full reproduction also requires backbone-specific hidden-state collection, hook registration, SAE/readout candidate extraction, task datasets, candidate refresh schedules, and intervention evaluation loops.
python -m pip install -e .or install the direct requirements:
python -m pip install -r requirements.txtpython scripts/evaluate_dcs.py \
--input examples/candidate_metrics.csv \
--output outputs/scored_candidates.csv \
--summary outputs/dcs_summary.csv \
--all-summary outputs/dcs_summary_all.csv \
--json-summary outputs/dcs_summary.json \
--group-by model regimeBy default, --summary reports DCS over the retained concept set C*, matching
the regime-level DCS definition in the paper. Use --summary-scope all for
diagnostic summaries over every candidate.
The scoring CLI expects one row per candidate concept or candidate set. Required columns are:
| Column | Meaning |
|---|---|
stability_match |
Seed/split/prompt recovery match rate. |
active_family_count |
Number of task families where the candidate is active. |
target_gain_pp |
Target behavior gain in percentage points. |
off_target_drift_pp |
Off-target behavior drift in percentage points. |
sufficiency_gain_pp |
Expected-direction behavior change in percentage points. |
pairwise_synergy_pp |
Pairwise composition synergy in percentage points. |
Optional grouping columns such as model, regime, task_family, and
candidate_type are preserved and can be passed to --group-by.
discoverability.dcs: DCS component normalization, retention filtering, and retained/all-candidate summaries.discoverability.interface: concept readouts, normalized intervention directions, slot assignments, matched-norm interventions, and simple grouping.discoverability.regime_losses: objective terms for the auxiliary interface.discoverability.policy: metadata-free selection from retained candidates.
python -m unittest discover -s testsThe tests cover the scoring contract, retained-only summaries, policy behavior, loss functions, and intervention interface shapes.
The default DCS configuration mirrors Appendix Table A14:
- stability range
[0.30, 0.82], retention threshold>= 0.50 - reuse range
[1, 4], retention threshold>= 2task families - locality range
[0.40, 0.86], retention thresholdstarget_gain >= 2.0ppandoff_target_drift <= 5.0pp - sufficiency range
[1.0, 12.0], retention threshold>= 2.0pp - compositionality range
[-0.5, 5.5], retention threshold>= 0.5pp
Validation-selected ranges and thresholds should be fixed before test evaluation. Test metadata should not be used by the selection policy.