Skip to content

feat: add AdaLN-Zero conditioning as alternative to FiLM#24

Open
tashapais wants to merge 1 commit into
AlmondGod:mainfrom
tashapais:feat/adaln-zero
Open

feat: add AdaLN-Zero conditioning as alternative to FiLM#24
tashapais wants to merge 1 commit into
AlmondGod:mainfrom
tashapais:feat/adaln-zero

Conversation

@tashapais
Copy link
Copy Markdown

Summary

  • Adds AdaLNZeroNorm to models/norms.py: a pre-norm module that produces (scale, shift, gate) from a zero-initialized MLP. The gate starts at zero so all residual paths are identity at initialization, matching the DiT paper's stabilization trick.
  • Threads use_adaln_zero through SpatialAttention, TemporalAttention, SwiGLUFFN, MoESwiGLUFFN, STTransformerBlock, STTransformer, and all three model classes (VideoTokenizer, LatentActionModel, DynamicsModel).
  • When enabled, the forward pattern changes from post-residual FiLM: norm(x + sublayer(x), cond) to pre-norm + gated residual: x + gate * sublayer(adaln(x, cond)).
  • use_adaln_zero=False by default, preserving all existing behavior and checkpoints.
  • Added to all config dataclasses and YAML configs.

Test plan

  • Train with use_adaln_zero: true in training.yaml and verify loss decreases normally
  • Verify use_adaln_zero: false (default) produces identical results to main
  • Run inference with a checkpoint trained with AdaLN-Zero (use_adaln_zero: true in inference.yaml)

Adds AdaLNZeroNorm to norms.py: pre-norm with zero-initialized MLP that
produces (scale, shift, gate). Gate starts at zero so residual paths are
identity at init, stabilizing early training (DiT-style).

Each sublayer (SpatialAttention, TemporalAttention, SwiGLUFFN, MoESwiGLUFFN)
gains a use_adaln_zero flag. When enabled, the forward switches from
post-residual FiLM norm to pre-norm + gated residual:
  x + gate * sublayer(adaln(x, conditioning))

use_adaln_zero=False by default, preserving all existing behavior.
Wired through STTransformer, all three model classes, training scripts,
and configs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant