Machine-Discoverable Concepts Reference Code

This repository contains the model-agnostic reference pieces for the paper "Inductive Biases for Machine-Discoverable Concepts":

Discoverability Composite Score (DCS) normalization and retention rules.
Locality/stability, compositional workspace, and redundancy-reduction losses.
A lightweight auxiliary concept interface over frozen hidden states.
Metadata-free test-time candidate selection.
A CSV CLI for scoring candidate-level measurements.

It is not a full reproduction package for the paper's Qwen3-VL, Phi, and Gemma experiments. Full reproduction also requires backbone-specific hidden-state collection, hook registration, SAE/readout candidate extraction, task datasets, candidate refresh schedules, and intervention evaluation loops.

Install

python -m pip install -e .

or install the direct requirements:

python -m pip install -r requirements.txt

Quick Check

python scripts/evaluate_dcs.py \
  --input examples/candidate_metrics.csv \
  --output outputs/scored_candidates.csv \
  --summary outputs/dcs_summary.csv \
  --all-summary outputs/dcs_summary_all.csv \
  --json-summary outputs/dcs_summary.json \
  --group-by model regime

By default, --summary reports DCS over the retained concept set C*, matching the regime-level DCS definition in the paper. Use --summary-scope all for diagnostic summaries over every candidate.

Candidate Metrics Schema

The scoring CLI expects one row per candidate concept or candidate set. Required columns are:

Column	Meaning
`stability_match`	Seed/split/prompt recovery match rate.
`active_family_count`	Number of task families where the candidate is active.
`target_gain_pp`	Target behavior gain in percentage points.
`off_target_drift_pp`	Off-target behavior drift in percentage points.
`sufficiency_gain_pp`	Expected-direction behavior change in percentage points.
`pairwise_synergy_pp`	Pairwise composition synergy in percentage points.

Optional grouping columns such as model, regime, task_family, and candidate_type are preserved and can be passed to --group-by.

Main Modules

discoverability.dcs: DCS component normalization, retention filtering, and retained/all-candidate summaries.
discoverability.interface: concept readouts, normalized intervention directions, slot assignments, matched-norm interventions, and simple grouping.
discoverability.regime_losses: objective terms for the auxiliary interface.
discoverability.policy: metadata-free selection from retained candidates.

Tests

python -m unittest discover -s tests

The tests cover the scoring contract, retained-only summaries, policy behavior, loss functions, and intervention interface shapes.

Paper Alignment

The default DCS configuration mirrors Appendix Table A14:

stability range [0.30, 0.82], retention threshold >= 0.50
reuse range [1, 4], retention threshold >= 2 task families
locality range [0.40, 0.86], retention thresholds target_gain >= 2.0pp and off_target_drift <= 5.0pp
sufficiency range [1.0, 12.0], retention threshold >= 2.0pp
compositionality range [-0.5, 5.5], retention threshold >= 0.5pp

Validation-selected ranges and thresholds should be fixed before test evaluation. Test metadata should not be used by the selection policy.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
discoverability		discoverability
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Discoverable Concepts Reference Code

Install

Quick Check

Candidate Metrics Schema

Main Modules

Tests

Paper Alignment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine-Discoverable Concepts Reference Code

Install

Quick Check

Candidate Metrics Schema

Main Modules

Tests

Paper Alignment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages