Skip to content

add MeMViT model #20545

@fcakyon

Description

@fcakyon

Model description

MeMViT, CVPR 2022 is the most efficient transformer-based video understanding model, and META AI released it. Its efficient online attention calculation mechanism decreases computation by 30 times compared to SOTA video classification models.

It would be an excellent addition to the transformers library considering it is the current SOTA on AVA, EPIC-Kitchens-100 action classification, and action anticipation datasets.

Your contribution

I want to work on adding this architecture to the HuggingFace.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Source code: https://github.com/facebookresearch/MeMViT
Weight files: https://github.com/facebookresearch/MeMViT#model-checkpoints

cc: @NielsRogge @alaradirik

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions