feat: add windowed attention in action tokenizer encoder by tashapais · Pull Request #25 · AlmondGod/tinyworlds

tashapais · 2026-04-16T19:45:41Z

Summary

Adds WindowedFrameAttention to models/latent_actions.py: for each pair of consecutive frames (t, t+1), concatenates all patch embeddings from both frames into a 2P sequence, applies self-attention, then mean-pools to a single E-dimensional vector.
LatentActionsEncoder gains use_windowed_attention=False (default). When enabled, the old mean pool + concat path is replaced by WindowedFrameAttention + simpler head (E -> action_dim).
Threaded through LatentActionModel, LatentActionsConfig, training.yaml, and train_latent_actions.py.

Why windowed attention over mean pool + concat:
Mean pooling discards patch-level spatial structure before frames are combined. With windowed attention, patches from frame t and frame t+1 can cross-attend before pooling, allowing the model to focus on the specific regions that changed (e.g., a moving sprite, a door opening) rather than averaging everything uniformly.

Test plan

Train with use_windowed_attention: true in training.yaml and compare action codebook diversity vs baseline
Verify use_windowed_attention: false (default) produces identical behavior to main

Adds WindowedFrameAttention: for each pair of consecutive frames (t, t+1), concatenates their P patch embeddings into a 2P sequence, applies self-attention, and mean-pools to a single embedding. Compared to mean pool + concat, patches from both frames can interact before pooling, giving richer inter-frame signal for action inference. Controlled by use_windowed_attention (default False) on LatentActionModel and LatentActionsConfig. The action head input shrinks from embed_dim*2 to embed_dim.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add windowed attention in action tokenizer encoder#25

feat: add windowed attention in action tokenizer encoder#25
tashapais wants to merge 1 commit into
AlmondGod:mainfrom
tashapais:feat/windowed-action-attention

tashapais commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tashapais commented Apr 16, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant