Clarification on reproducing results (Table 1, Table 4, Table A7) and using LAVE evaluator

Hello, thank you for releasing your impressive work!

I'm currently trying to reproduce the results reported in Table 1, Table 4, and Table A7 of the paper.  
However, I found that the repository only provides the `inference.py` script, and there are no detailed instructions for evaluating on benchmark datasets.

I saw that LAVE is mentioned as the evaluator, and I checked its README, but I'm still unsure about how to properly run the evaluation.  
In particular, I’d like to ask:

1. How should the evaluation process be performed with LAVE to reproduce your reported results?  
2. How is the `auto-vocabulary.json` file generated or saved before running `lave.py`?  
3. Are there any specific configurations or dataset formats required for evaluation?

Any brief guideline or example command would be greatly appreciated.  
Thank you very much for your time and support!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on reproducing results (Table 1, Table 4, Table A7) and using LAVE evaluator #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification on reproducing results (Table 1, Table 4, Table A7) and using LAVE evaluator #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions