Non-Record: Add Novel SemanticEngine SSM submission#2122
Conversation
Covers system naming (SemanticEngine / CareSSM / ChaosSsm / SemanticOptimizer), file structure, train_gpt.py section breakdown, new chaoscontrol public/ module, training/eval prequential contract, and implementation task order.
|
Architecture note for reviewers: This submission is a language model built around a recurrent SSM, not a transformer with attention layers. The novel part is that the model has a separate online episodic-memory system that prepares small residual tensors for the SSM to consume. Think of it as an asynchronous memory channel into the recurrent model, not a post-processing cache or a second-pass scorer. The hardware split is part of the method:
The eval path is prequential. For each chunk, the model first scores the tokens using only the checkpoint plus memory built from earlier chunks. Only after that chunk's loss is fixed can its hidden-state evidence update the memory table for future chunks. There is no train-on-validation-before-scoring step, no rescoring of the same chunk, and no best-of-multiple-passes selection. So the core claim is: a fast SSM trunk can stay on its throughput path while a separate memory subsystem continuously prepares causal residual information for it. The memory is online during eval, but it only learns from tokens after their score has already been counted. |
Summary
Adds the SemanticEngine CareSSM submission for
track_10min_16mb. This is a pure SSM trunk with live episodic memory during both training and legal prequential eval.Results
Full 50k FineWeb validation docs, packet-online cache, score-before-write eval:
Std:
val_bpb_std=0.02333959,val_loss_std=0.05789620.Architecture
Verification
python -m json.tool records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/submission.jsonpython -m py_compile records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/train_gpt.pypython -m pytest tests/submission/test_train_gpt_hyperparams.py -q-> 9 passed