Skip to content

OpenVid-1M 200K training subset list/sampling protocol and OpenVid30 test split details #48

@jeongyh98

Description

@jeongyh98

Thank you for sharing the STAR work and code. I am trying to reproduce the training and evaluation settings described in the paper, and I have two questions about the OpenVid splits.

  • OpenVid-1M 200K training subset
    In the paper, STAR is trained on “a subset of OpenVid-1M with 200K text-video pairs.”
    Could you please clarify how this 200K subset is defined?

Is there a released ID list / file list (e.g., txt/csv/json) for the 200K samples?

If it is sampled dynamically, could you share the exact sampling procedure, including any filtering rules and the random seed used?

  • OpenVid30 test set
    The paper mentions that OpenVid30 is separated from OpenVid-1M with no overlap with training data, and consists of the first ~100 frames of each video.

Could you share the exact video IDs included in OpenVid30 (or a list file)?

How was “no overlap” ensured in practice (e.g., by video ID, URL, hash, etc.)?

If these split files already exist in the repository, please point me to the paths. Any guidance would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions