Non-Record: Add Novel SemanticEngine SSM submission by KenMalloy · Pull Request #2122 · openai/parameter-golf

KenMalloy · 2026-05-01T15:06:21Z

Summary

Adds the SemanticEngine CareSSM submission for track_10min_16mb. This is a pure SSM trunk with live episodic memory during both training and legal prequential eval.

Results

Full 50k FineWeb validation docs, packet-online cache, score-before-write eval:

Seed	val_bpb	val_loss	Train steps	Train time	Eval time
42	1.64076237	4.07007627	1692	595.97s	347.0s
1337	1.66718946	4.13563133	1692	594.15s	349.5s
294924	1.62065301	4.02019298	1688	594.27s	364.8s
Mean	1.64286828	4.07530019	1690.67	594.78s	353.77s

Std: val_bpb_std=0.02333959, val_loss_std=0.05789620.

Architecture

GPU0-5: CareSSM trunk training
GPU6: episodic residual packet-serving rank
GPU7: memory maintenance rank
Eval is prequential: each chunk is scored before its evidence can update the cache for future chunks

Verification

python -m json.tool records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/submission.json
python -m py_compile records/track_10min_16mb/2026-05-01_SemanticEngine_CareSSM/train_gpt.py
python -m pytest tests/submission/test_train_gpt_hyperparams.py -q -> 9 passed

Covers system naming (SemanticEngine / CareSSM / ChaosSsm / SemanticOptimizer), file structure, train_gpt.py section breakdown, new chaoscontrol public/ module, training/eval prequential contract, and implementation task order.

…moke test

KenMalloy · 2026-05-01T15:08:59Z

Architecture note for reviewers:

This submission is a language model built around a recurrent SSM, not a transformer with attention layers. The novel part is that the model has a separate online episodic-memory system that prepares small residual tensors for the SSM to consume. Think of it as an asynchronous memory channel into the recurrent model, not a post-processing cache or a second-pass scorer.

The hardware split is part of the method:

GPUs 0-5 run the main SSM training/eval path. They keep moving even if memory has nothing ready.
GPU 6 serves memory packets: compact residual tensors computed from the current memory table and published latest-complete to the main SSM. If no fresh packet is available, the residual is zero, so the trunk never blocks.
GPU 7 maintains the memory table: it decides which hidden-state evidence should be kept, refreshed, or retired, and sends committed memory updates to the packet-serving rank.
The CPU schedules this work and records telemetry. It coordinates the memory system; it is not where the main model FLOPs happen.

The eval path is prequential. For each chunk, the model first scores the tokens using only the checkpoint plus memory built from earlier chunks. Only after that chunk's loss is fixed can its hidden-state evidence update the memory table for future chunks. There is no train-on-validation-before-scoring step, no rescoring of the same chunk, and no best-of-multiple-passes selection.

So the core claim is: a fast SSM trunk can stay on its throughput path while a separate memory subsystem continuously prepares causal residual information for it. The memory is online during eval, but it only learns from tokens after their score has already been counted.

KenMalloy added 3 commits May 1, 2026 04:19

docs: SemanticEngine submission design plan

26d0261

Covers system naming (SemanticEngine / CareSSM / ChaosSsm / SemanticOptimizer), file structure, train_gpt.py section breakdown, new chaoscontrol public/ module, training/eval prequential contract, and implementation task order.

docs: SemanticEngine implementation plan — engine_entry, train_gpt, s…

c8783f3

…moke test

Add SemanticEngine CareSSM submission

e5307a2

KenMalloy changed the title ~~Add SemanticEngine CareSSM submission~~ Non-Record: Add Novel SemanticEngine CareSSM submission May 1, 2026

KenMalloy changed the title ~~Non-Record: Add Novel SemanticEngine CareSSM submission~~ Non-Record: Add Novel SemanticEngine SSM submission May 1, 2026

Clarify SemanticEngine artifact accounting

3df47e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Record: Add Novel SemanticEngine SSM submission#2122

Non-Record: Add Novel SemanticEngine SSM submission#2122
KenMalloy wants to merge 4 commits into
openai:mainfrom
KenMalloy:feature/semanticengine-submission

KenMalloy commented May 1, 2026

Uh oh!

KenMalloy commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KenMalloy commented May 1, 2026

Summary

Results

Architecture

Verification

Uh oh!

KenMalloy commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KenMalloy commented May 1, 2026 •

edited

Loading