Hi, I have some problems when trying some examples by runing caption_anything/model.py.
The dense captions are all filtered by min_ppl_score and min_clip_score in parse_dense_caption(). I noticed that ppl_score is always -100.0 and clip_score is always 0.0, for example:
{'generated_captions': {'raw_caption': 'there is a girl holding a cat and a dog in her arms'}, 'crop_save_path': 'result/crop_1699081274.5122437.png', 'mask_save_path': 'result/mask_1699081274.5082428.png', 'mask': <PIL.Image.Image image mode=RGB size=512x320 at 0x1D552934B80>, 'bbox': array([ 0, 1, 511, 317]), 'area': 81409, 'context_captions': [], 'ppl_score': -100.0, 'clip_score': 0.0},...}
I think the caption is reasonable and should not have that low score, have you met the similar problem or some guess about this?
Hi, I have some problems when trying some examples by runing
caption_anything/model.py.The dense captions are all filtered by
min_ppl_scoreandmin_clip_scoreinparse_dense_caption(). I noticed thatppl_scoreis always -100.0 andclip_scoreis always 0.0, for example:{'generated_captions': {'raw_caption': 'there is a girl holding a cat and a dog in her arms'}, 'crop_save_path': 'result/crop_1699081274.5122437.png', 'mask_save_path': 'result/mask_1699081274.5082428.png', 'mask': <PIL.Image.Image image mode=RGB size=512x320 at 0x1D552934B80>, 'bbox': array([ 0, 1, 511, 317]), 'area': 81409, 'context_captions': [], 'ppl_score': -100.0, 'clip_score': 0.0},...}I think the caption is reasonable and should not have that low score, have you met the similar problem or some guess about this?