Skip to content

Reducing architecture complexity with M-RoPE #3

@NilanEkanayake

Description

@NilanEkanayake

Nice work!
Have you tried using M-RoPE? It allows using a unified positional encoding space for patches and 1D tokens. From my (small-scale, limited) testing, it makes for very good dynamic res/token-count 1D tokenizers, with RoPE's extrapolation ability as a bonus. I've also found RoPE to be more stable under GAN training than learned pos-emb.

I have a reference video-tokenizer codebase here that uses sample packing for dynamic-resolution training. It uses M-RoPE+GQA for adaptability and low compute together, leaving it as a mostly standard ViT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions