Hello, thank you for releasing your impressive work!
I'm currently trying to reproduce the results reported in Table 1, Table 4, and Table A7 of the paper.
However, I found that the repository only provides the inference.py script, and there are no detailed instructions for evaluating on benchmark datasets.
I saw that LAVE is mentioned as the evaluator, and I checked its README, but I'm still unsure about how to properly run the evaluation.
In particular, I’d like to ask:
- How should the evaluation process be performed with LAVE to reproduce your reported results?
- How is the
auto-vocabulary.json file generated or saved before running lave.py?
- Are there any specific configurations or dataset formats required for evaluation?
Any brief guideline or example command would be greatly appreciated.
Thank you very much for your time and support!
Hello, thank you for releasing your impressive work!
I'm currently trying to reproduce the results reported in Table 1, Table 4, and Table A7 of the paper.
However, I found that the repository only provides the
inference.pyscript, and there are no detailed instructions for evaluating on benchmark datasets.I saw that LAVE is mentioned as the evaluator, and I checked its README, but I'm still unsure about how to properly run the evaluation.
In particular, I’d like to ask:
auto-vocabulary.jsonfile generated or saved before runninglave.py?Any brief guideline or example command would be greatly appreciated.
Thank you very much for your time and support!