records(non-record-16mb): JEPA-on-LM 14-run ablation (negative result)#2142
Open
eren23 wants to merge 1 commit into
Open
records(non-record-16mb): JEPA-on-LM 14-run ablation (negative result)#2142eren23 wants to merge 1 commit into
eren23 wants to merge 1 commit into
Conversation
Comprehensive ablation showing JEPA auxiliary objectives do not improve val_bpb on parameter-golf at 17M / sp1024 / FineWeb scale. Cleanest recipe (alpha=0.001, VAR_WEIGHT=0, MSE-only Path A) ties baseline exactly at val_bpb=1.2311 (step 50K, promotion preset, 7200s wallclock). - 14 runs at the same N (17.06M / 17.13M with predictor MLP, +0.4%). - Two-seed paired baselines (1337, 42) -> 0.0022 noise floor. - lambda sweep across 4 orders of magnitude (1e-4..0.2). - Path A / Path B / injection / V-JEPA covariance ablation. Refines findings from PR openai#896 (Manav Pandey) and PR openai#1330 (luciobaiocchi). Architecture and full finding doc published at https://github.com/eren23/crucible-community-tap (architectures/jepa_lm, findings/parameter-golf-jepa-ablation).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Non-record submission documenting a comprehensive negative result:
JEPA auxiliary objectives do not improve
val_bpbon parameter-golfat the 17.06M-param / sp1024 / FineWeb scale. The cleanest recipe ties
baseline exactly. Submitting to formalize the negative finding so future
JEPA submitters don't re-run the same grid.
jepa-var-zero, alpha=0.001, VAR_WEIGHT=0):val_bpb = 1.2311 at step 50K — exact tie with same-seed baseline.
What's new (param-count clean)
All 14 variants share one architectural backbone: 17,059,912-param BaselineGPT (9L, 512d, KV4, MLP_MULT=2, sp1024, relu_sq, tied embeds). JEPA variants add a single 65,536-param predictor MLP (model_dim → 64 → model_dim, zero-init on output) — total 17,125,448 params (+0.4%). No model-shape changes across the grid; only loss weights differ.
14-run table — final val_bpb @ step 50K
* wallclock cap hit before step 50K on slower hardware; column shows actual.
Three findings
Reproducibility
Track and quant note
Track: `non-record-unlimited-compute-16mb`. The model artifact was not int8+zlib quantized for this submission — we're submitting an ablation finding, not a leaderboard ranking candidate. `val_bpb` reported is the pre-quant running value at step 50K. The bytes_total / bytes_model_int8_zlib fields in submission.json are null. If a finding-style submission is preferred under a different track, happy to relabel.
Refines
Test plan
Generated with Claude Code and Crucible plugin-based ML research platform.