Using the pre-trained model provided in this repository, I ran the following command to calculate the performance of the model on the test data.
python test.py --cfg configs/dynamic/dynamic.yaml --visualize --snapshot 9000 --gpu 0
python evaluate.py \
--results_dir ./experiments/dynamic/test_output/captions \
--anno ./data/total_change_captions_reformat.json \
--type_file ./data/type_mapping.json
The calculated scores are as follows:
| BLEU-4 |
CIDEr |
METEOR |
SPICE |
| 0.5355 |
1.1496 |
0.3794 |
0.3128 |
This score is slightly higher than the score provided in the paper. Is the model provided in this repository different from the model used in the paper?
Using the pre-trained model provided in this repository, I ran the following command to calculate the performance of the model on the test data.
The calculated scores are as follows:
This score is slightly higher than the score provided in the paper. Is the model provided in this repository different from the model used in the paper?