Conversation
|
@rays1024 Any updates on this? btw, the title for your PR is kind of confusing. If you are working on two different things (length calculator and ds-1000 dataset), you should have two separate branches and two separate PRs accordingly. |
Sorry I have been pretty busy with projects and assignments, but I will work on the issue and try to have an update by Friday. I'll also make a new PR for length_calculator as well. Thanks for the reminder! |
|
I see, thanks for the update. Let me know if anything changes. |
…t llama-based model uses empty string as tokenizer_eos_token
update branch to latest main branch
|
Updating this pull request's name to "DS1000 dataset" to avoid confusion. Updates on length_calculator will be made on another branch/PR |
niansong1996
left a comment
There was a problem hiding this comment.
Haven't tested the actual functionality but the executors and datasets look good in general.
Made some comments, can you fix the raised issue?
Thanks!
DS_1000/ds1000.jsonl
Outdated
There was a problem hiding this comment.
please put this file in the data directory
There was a problem hiding this comment.
We don't want DS-1000's license and readme files in our repo
There was a problem hiding this comment.
I'll delete the license, but the readme file is the instructions for evaluation and creating jsonl file.
There was a problem hiding this comment.
Is this part of the testing harness for DS-1000?
There was a problem hiding this comment.
Yes, this script with the zip file are needed for evaluation. The DS-1000 evaluation requires having the original data so that the problems could be loaded into a DS1000Problem class. This class is then used for evaluation.
There was a problem hiding this comment.
zip files should not be checked in (i.e., committed to online repo) unless there is a strong reason in doing so
There was a problem hiding this comment.
I can work on removing the need for the original problem data, but these data in the zip file are indeed needed for evaluation.
DS_1000/preprocess.py
Outdated
There was a problem hiding this comment.
Why is this appending lines (i.e., using open mode a)?
There was a problem hiding this comment.
Each iteration of the for loop will append one dictionary as a new line to the jsonl file, which is why I used 'a' here. If other ways are commonly used, I can definitely change that!
There was a problem hiding this comment.
We should probably put this file in analysis or utils or preprocessing
…to rui/ds-1000 Merging with Yilun's branch to run evaluation on DS1000
|
Using jsonargparse==4.15.0 would resolve the previous problems. Still working on fixing the directory problem when evaluating 208 or more problems. |
|
Error with torch tensor occurred when evaluating incoder-1b, same as this discussion. I added incoder-1b to the if statement at line 159 of seq2seq_model.py to fix this issue |
|
fixed result saving issue by adding a safe executing wrapper in the DS1000Executor class. The wrapper uses execute in safe_execution_util as a template. |
No description provided.