Skip to content

Non-Record: Add Novel SemanticEngine SSM submission#2122

Open
KenMalloy wants to merge 4 commits into
openai:mainfrom
KenMalloy:feature/semanticengine-submission
Open

Non-Record: Add Novel SemanticEngine SSM submission#2122
KenMalloy wants to merge 4 commits into
openai:mainfrom
KenMalloy:feature/semanticengine-submission

Conversation

@KenMalloy
Copy link
Copy Markdown

Summary

Adds the SemanticEngine CareSSM submission for track_10min_16mb. This is a pure SSM trunk with live episodic memory during both training and legal prequential eval.

Results

Full 50k FineWeb validation docs, packet-online cache, score-before-write eval:

Seed val_bpb val_loss Train steps Train time Eval time
42 1.64076237 4.07007627 1692 595.97s 347.0s
1337 1.66718946 4.13563133 1692 594.15s 349.5s
294924 1.62065301 4.02019298 1688 594.27s 364.8s
Mean 1.64286828 4.07530019 1690.67 594.78s 353.77s

Std: val_bpb_std=0.02333959, val_loss_std=0.05789620.

Architecture

  • GPU0-5: CareSSM trunk training
  • GPU6: episodic residual packet-serving rank
  • GPU7: memory maintenance rank
  • Eval is prequential: each chunk is scored before its evidence can update the cache for future chunks

Verification

  • python -m json.tool records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/submission.json
  • python -m py_compile records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/train_gpt.py
  • python -m pytest tests/submission/test_train_gpt_hyperparams.py -q -> 9 passed

KenMalloy added 3 commits May 1, 2026 04:19
Covers system naming (SemanticEngine / CareSSM / ChaosSsm /
SemanticOptimizer), file structure, train_gpt.py section breakdown,
new chaoscontrol public/ module, training/eval prequential contract,
and implementation task order.
@KenMalloy
Copy link
Copy Markdown
Author

KenMalloy commented May 1, 2026

Architecture note for reviewers:

This submission is a language model built around a recurrent SSM, not a transformer with attention layers. The novel part is that the model has a separate online episodic-memory system that prepares small residual tensors for the SSM to consume. Think of it as an asynchronous memory channel into the recurrent model, not a post-processing cache or a second-pass scorer.

The hardware split is part of the method:

  • GPUs 0-5 run the main SSM training/eval path. They keep moving even if memory has nothing ready.
  • GPU 6 serves memory packets: compact residual tensors computed from the current memory table and published latest-complete to the main SSM. If no fresh packet is available, the residual is zero, so the trunk never blocks.
  • GPU 7 maintains the memory table: it decides which hidden-state evidence should be kept, refreshed, or retired, and sends committed memory updates to the packet-serving rank.
  • The CPU schedules this work and records telemetry. It coordinates the memory system; it is not where the main model FLOPs happen.

The eval path is prequential. For each chunk, the model first scores the tokens using only the checkpoint plus memory built from earlier chunks. Only after that chunk's loss is fixed can its hidden-state evidence update the memory table for future chunks. There is no train-on-validation-before-scoring step, no rescoring of the same chunk, and no best-of-multiple-passes selection.

So the core claim is: a fast SSM trunk can stay on its throughput path while a separate memory subsystem continuously prepares causal residual information for it. The memory is online during eval, but it only learns from tokens after their score has already been counted.

@KenMalloy KenMalloy changed the title Add SemanticEngine CareSSM submission Non-Record: Add Novel SemanticEngine CareSSM submission May 1, 2026
@KenMalloy KenMalloy changed the title Non-Record: Add Novel SemanticEngine CareSSM submission Non-Record: Add Novel SemanticEngine SSM submission May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant