Hello MarrLab team,
Thank you for releasing the HistoGPT codebase. I am trying to reproduce the experiments from the paper and noticed that the repository appears to include the main HistoGPT pipeline (MIL pre-training and autoregressive image-text fine-tuning), but I could not find the implementation/training scripts for the intermediate contrastive baselines HistoCLIP and HistoSigLIP described in the Methods.
In the paper you mention:
- “For HistoCLIP we used the same loss as for CLIP. For HistoSigLIP we used the loss proposed in SigLIP.”
- “We froze the vision encoder during training (locked-image text tuning).”
Could you please clarify:
- Are the HistoCLIP/HistoSigLIP training scripts available somewhere (another branch/repo), or are they planned to be released?
- If they are not planned for release, could you share the key implementation details needed for reproduction (e.g., exact image/text embeddings used, whether the resampler is trained or frozen, batch construction/negatives, temperature/logit_scale handling, and any loss weighting or normalization)?
- If possible, could you point to the commit/PR that contains these baselines?
Thanks in advance for your help.
Best regards,
Xiuju Du
Hello MarrLab team,
Thank you for releasing the HistoGPT codebase. I am trying to reproduce the experiments from the paper and noticed that the repository appears to include the main HistoGPT pipeline (MIL pre-training and autoregressive image-text fine-tuning), but I could not find the implementation/training scripts for the intermediate contrastive baselines HistoCLIP and HistoSigLIP described in the Methods.
In the paper you mention:
Could you please clarify:
Thanks in advance for your help.
Best regards,
Xiuju Du