Skip to content

Inquiry regarding Grounding DINO checkpoints for target_image generation (LIBERO/CALVIN) #9

@wangerforcs

Description

@wangerforcs

Hi ReconVLA Team,

Thank you for open-sourcing this impressive work! The reconstructive paradigm for VLA models is a very creative approach to tackling fine-grained visual attention in robotics.

In Step 2 (Generate target_image), the documentation mentions using object detection and grounding methods like Grounding DINO to extract gaze regions for datasets like LIBERO and CALVIN. I was wondering: do you provide official checkpoints for Grounding DINO that have been fine-tuned or specifically configured for the LIBERO, CALVIN, or BridgeData environments?

Having access to these weights or a more detailed processing script would be extremely helpful for reproducing your results.

Thank you for your time and for this great contribution to the field!

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions