Dimensionality of `LanguageBind_Video_Huge_V1.5_FT`

Hi,

Upon checking the large video model `LanguageBind_Video_Huge_V1.5_FT`, I noticed that its embedding dimension is `1024`, whereas the others use `768`. However, I couldn’t find a corresponding language backbone with the same embedding size. Could you clarify which language model should be used to match this embedding space? Thank you.

```
  (modality_proj): ModuleDict(
    (video): Linear(in_features=1280, out_features=1024, bias=False)
    (image): Linear(in_features=1024, out_features=768, bias=False)
    (language): Linear(in_features=768, out_features=768, bias=False)
  )
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimensionality of `LanguageBind_Video_Huge_V1.5_FT` #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dimensionality of LanguageBind_Video_Huge_V1.5_FT #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Dimensionality of `LanguageBind_Video_Huge_V1.5_FT` #72