add MeMViT model

### Model description

[MeMViT, CVPR 2022](https://arxiv.org/abs/2201.08383) is the most efficient transformer-based video understanding model, and META AI released it. Its efficient online attention calculation mechanism decreases computation by 30 times compared to SOTA video classification models.

It would be an excellent addition to the `transformers` library considering it is the current SOTA on AVA, EPIC-Kitchens-100 action classification, and action anticipation datasets.

### Your contribution

I want to work on adding this architecture to the HuggingFace.

### Open source status
- [x] The model implementation is available
- [x] The model weights are available

### Provide useful links for the implementation

Source code: https://github.com/facebookresearch/MeMViT
Weight files: https://github.com/facebookresearch/MeMViT#model-checkpoints

cc: @NielsRogge @alaradirik 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MeMViT model #20545

Model description

Your contribution

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

add MeMViT model #20545

Description

Model description

Your contribution

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions