Skip to content

cache directory bug #7

@dvolgyes

Description

@dvolgyes

The user home directories are actually not a good place to store data.
This is configurable in the huggingface interface with something like this:

import datasets
datasets.load_dataset("sjyhne/mapai_training_data", cache_dir="/mnt/experiment-3/huggingface")

However, it seems there is a bug in the mapai_training_data.py, and it can't properly handle nondefault directory.
I would suggest to add an optional cache_dir to the create_dataset function which is propagated
into the load_dataset, and also fixing the bug in the huggingface datasets.

It is hackable, but especially on long term, after the competition, it would be nice to have a conformant dataset
which could be used in the future easily.

Reproduction:

  • execute the above lines

Possible cause:

  • very likely in the _split_genetators function the current working dir is not the one which is assumed,
    therefore the os.makedirs refer to wrong location.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions