Skip to content

feat: add windowed attention in action tokenizer encoder#25

Open
tashapais wants to merge 1 commit into
AlmondGod:mainfrom
tashapais:feat/windowed-action-attention
Open

feat: add windowed attention in action tokenizer encoder#25
tashapais wants to merge 1 commit into
AlmondGod:mainfrom
tashapais:feat/windowed-action-attention

Conversation

@tashapais
Copy link
Copy Markdown

Summary

  • Adds WindowedFrameAttention to models/latent_actions.py: for each pair of consecutive frames (t, t+1), concatenates all patch embeddings from both frames into a 2P sequence, applies self-attention, then mean-pools to a single E-dimensional vector.
  • LatentActionsEncoder gains use_windowed_attention=False (default). When enabled, the old mean pool + concat path is replaced by WindowedFrameAttention + simpler head (E -> action_dim).
  • Threaded through LatentActionModel, LatentActionsConfig, training.yaml, and train_latent_actions.py.

Why windowed attention over mean pool + concat:
Mean pooling discards patch-level spatial structure before frames are combined. With windowed attention, patches from frame t and frame t+1 can cross-attend before pooling, allowing the model to focus on the specific regions that changed (e.g., a moving sprite, a door opening) rather than averaging everything uniformly.

Test plan

  • Train with use_windowed_attention: true in training.yaml and compare action codebook diversity vs baseline
  • Verify use_windowed_attention: false (default) produces identical behavior to main

Adds WindowedFrameAttention: for each pair of consecutive frames (t, t+1),
concatenates their P patch embeddings into a 2P sequence, applies self-attention,
and mean-pools to a single embedding. Compared to mean pool + concat, patches
from both frames can interact before pooling, giving richer inter-frame signal
for action inference.

Controlled by use_windowed_attention (default False) on LatentActionModel and
LatentActionsConfig. The action head input shrinks from embed_dim*2 to embed_dim.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant