Firstly, I would like to express my sincere appreciation for your outstanding work. Pink has demonstrated significant performance on the RefCOCO dataset, with Pink-G, in particular, achieving the best results.
However, I've noticed that the repository does not seem to include the specific implementation details or code for Pink-G. I am curious if the primary modification in Pink-G is simply a change of the visual encoder. If this is the case, I was wondering if it utilizes a visual encoder from the OpenCLIP suite (BobMcDear/open-clip-jax-vit-huge-patch14-laion2b-s32b-b79k)?
I look forward to your response and would be grateful for any clarification on this matter.
Firstly, I would like to express my sincere appreciation for your outstanding work. Pink has demonstrated significant performance on the RefCOCO dataset, with Pink-G, in particular, achieving the best results.
However, I've noticed that the repository does not seem to include the specific implementation details or code for Pink-G. I am curious if the primary modification in Pink-G is simply a change of the visual encoder. If this is the case, I was wondering if it utilizes a visual encoder from the OpenCLIP suite (BobMcDear/open-clip-jax-vit-huge-patch14-laion2b-s32b-b79k)?
I look forward to your response and would be grateful for any clarification on this matter.