results+paper(z-gap): Strategy E — multi-model P3 probing closes M5 by heznpc · Pull Request #5 · heznpc/z-gap

heznpc · 2026-05-20T17:52:32Z

P3 cross-lingual probing run on the same 7-model set as Strategy D, with
per-cell binomial test against chance. New runner:
experiments/scripts/run_strategy_e_multimodel_probing.py

Results (probe trained on English, mean over 4 non-English langs):

Model cat_en cat_xfer op_en op_xfer
────────────────────────────────────────────────────────────────
UniXcoder (code) 0.990 0.670 1.000 0.175
MiniLM-L12 (NL) 1.000 0.900 1.000 0.858
Nomic v1.5 (NL+code) 1.000 0.625 1.000 0.225
E5-small (NL) 1.000 0.985 1.000 0.892
E5-base (NL) 1.000 0.978 1.000 0.958
E5-large (NL) 1.000 0.995 1.000 0.978
BGE-M3 (NL+code) 1.000 0.990 1.000 0.975

All non-trivial non-en cells: p < 1e-25 vs chance (binomial).

Findings:

P3 is supported in multilingual NL models (MiniLM, E5 family, BGE-M3) but
is MODEL-CLASS DEPENDENT. Code-trained (UniXcoder) and mixed NL+code (Nomic)
models reach near-perfect English training accuracy but collapse on
cross-lingual transfer (0.62-0.67 cat, 0.18-0.23 op). This refines the
paper's original P3 claim: cross-lingual Z_sem separability is a property
of the multilingual NL training distribution, not an intrinsic property of
every embedding space with R_code > 1.
E5 family P3 scale-convergence (echo of Strategy D pattern under fixed
architecture/training recipe): operation transfer 0.89 (384d) -> 0.96
(768d) -> 0.98 (1024d).

Paper:

§5.5 P3 Results table: 1 row (MiniLM only) -> 7 rows. Body text rewritten
to surface the model-class dependence finding + E5 family P3
scale-convergence echo.
Limitations "Z stratification" bullet: "not validated across model
families" replaced with "supported on 7 models with model-class
dependence"; remaining work narrowed to decoder-only LLM hidden states +
tier2/tier3 OOD stimuli.

Decisions log:

planning/decisions.md: 2026-05-21 Strategy E entry covering the design
rationale, the model-class dependence finding, and the scope of remaining
P3 work after this PR.

Closes M5 from the 2026-05-21 pre-experiment review. C1 (contamination)
deferred portion via OOD stimuli is the next follow-up.

P3 cross-lingual probing run on the same 7-model set as Strategy D, with per-cell binomial test against chance. New runner: experiments/scripts/run_strategy_e_multimodel_probing.py Results (probe trained on English, mean over 4 non-English langs): Model cat_en cat_xfer op_en op_xfer ──────────────────────────────────────────────────────────────── UniXcoder (code) 0.990 0.670 1.000 0.175 MiniLM-L12 (NL) 1.000 0.900 1.000 0.858 Nomic v1.5 (NL+code) 1.000 0.625 1.000 0.225 E5-small (NL) 1.000 0.985 1.000 0.892 E5-base (NL) 1.000 0.978 1.000 0.958 E5-large (NL) 1.000 0.995 1.000 0.978 BGE-M3 (NL+code) 1.000 0.990 1.000 0.975 All non-trivial non-en cells: p < 1e-25 vs chance (binomial). Findings: - P3 is supported in multilingual NL models (MiniLM, E5 family, BGE-M3) but is MODEL-CLASS DEPENDENT. Code-trained (UniXcoder) and mixed NL+code (Nomic) models reach near-perfect English training accuracy but collapse on cross-lingual transfer (0.62-0.67 cat, 0.18-0.23 op). This refines the paper's original P3 claim: cross-lingual Z_sem separability is a property of the multilingual NL training distribution, not an intrinsic property of every embedding space with R_code > 1. - E5 family P3 scale-convergence (echo of Strategy D pattern under fixed architecture/training recipe): operation transfer 0.89 (384d) -> 0.96 (768d) -> 0.98 (1024d). Paper: - §5.5 P3 Results table: 1 row (MiniLM only) -> 7 rows. Body text rewritten to surface the model-class dependence finding + E5 family P3 scale-convergence echo. - Limitations "Z stratification" bullet: "not validated across model families" replaced with "supported on 7 models with model-class dependence"; remaining work narrowed to decoder-only LLM hidden states + tier2/tier3 OOD stimuli. Decisions log: - planning/decisions.md: 2026-05-21 Strategy E entry covering the design rationale, the model-class dependence finding, and the scope of remaining P3 work after this PR. Closes M5 from the 2026-05-21 pre-experiment review. C1 (contamination) deferred portion via OOD stimuli is the next follow-up.

heznpc merged commit 72cd8ea into main May 20, 2026
1 check passed

heznpc deleted the chore/strategy-e-multimodel-probing-2026-05-21 branch May 20, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results+paper(z-gap): Strategy E — multi-model P3 probing closes M5#5

results+paper(z-gap): Strategy E — multi-model P3 probing closes M5#5
heznpc merged 1 commit into
mainfrom
chore/strategy-e-multimodel-probing-2026-05-21

heznpc commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heznpc commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant