Skip to content

Dimensionality of LanguageBind_Video_Huge_V1.5_FT #72

@pizhn

Description

@pizhn

Hi,

Upon checking the large video model LanguageBind_Video_Huge_V1.5_FT, I noticed that its embedding dimension is 1024, whereas the others use 768. However, I couldn’t find a corresponding language backbone with the same embedding size. Could you clarify which language model should be used to match this embedding space? Thank you.

  (modality_proj): ModuleDict(
    (video): Linear(in_features=1280, out_features=1024, bias=False)
    (image): Linear(in_features=1024, out_features=768, bias=False)
    (language): Linear(in_features=768, out_features=768, bias=False)
  )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions