Inquiry regarding Grounding DINO checkpoints for target_image generation (LIBERO/CALVIN)

Hi ReconVLA Team,

Thank you for open-sourcing this impressive work! The reconstructive paradigm for VLA models is a very creative approach to tackling fine-grained visual attention in robotics.

In Step 2 (Generate target_image), the documentation mentions using object detection and grounding methods like **Grounding DINO** to extract gaze regions for datasets like LIBERO and CALVIN. I was wondering: **do you provide official **checkpoints for Grounding DINO** that have been fine-tuned or specifically configured for the LIBERO, CALVIN, or BridgeData environments?**

Having access to these weights or a more detailed processing script would be extremely helpful for reproducing your results.

Thank you for your time and for this great contribution to the field!

Best regards. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry regarding Grounding DINO checkpoints for target_image generation (LIBERO/CALVIN) #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inquiry regarding Grounding DINO checkpoints for target_image generation (LIBERO/CALVIN) #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions