Skip to content

Saving records to multiple .pkl files#38

Open
arinaruck wants to merge 19 commits intohotpotqa:masterfrom
arinaruck:master
Open

Saving records to multiple .pkl files#38
arinaruck wants to merge 19 commits intohotpotqa:masterfrom
arinaruck:master

Conversation

@arinaruck
Copy link
Copy Markdown

argument --num_files added in main.py: if 1 (default value) saves datapoints (object wise) to 1 file (.pkl), if -1 makes as many .pkl files as objects, if n > 1, saves to n almost equally sized files (last one can differ in size).
That makes the dataset compatible with custom PyTorch Dataset and library PyTorch Generator (https://stackoverflow.com/questions/54571377/how-to-create-a-custom-pytorch-dataset-when-the-order-and-the-total-number-of-tr/54572327#54572327).
The files are saved to data_split ("train" or "dev") directory, which is created if not there already, and filenames are the same but with the batch number in the end (e. g. train/train_record_50.pkl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant