There are some problems with the ptb dataset downloaded using the script in the manual, it can not pass the test of assert ntokens == 10000 in the test_ptb_dataset. The total number of the unique words is 8481 < 10000(I think my code is right). And I download the data from this url, and it can pass the test above. Then I compare the datasets(train.txt, test.txt, valid.txt) from the two sources using diff, and they do differ on train.txt.
There are some problems with the ptb dataset downloaded using the script in the manual, it can not pass the test of
assert ntokens == 10000in thetest_ptb_dataset. The total number of the unique words is 8481 < 10000(I think my code is right). And I download the data from this url, and it can pass the test above. Then I compare the datasets(train.txt, test.txt, valid.txt) from the two sources usingdiff, and they do differ on train.txt.