The current tokenization is not done in batch and it takes forever. We could change the dataloader helpers to tokenize examples in batch and that would speed things up a lot. We'd at least need to change the code here: https://github.com/jscuds/rf-bert/blob/master/dataloaders/helpers.py#L58-L62
The current tokenization is not done in batch and it takes forever. We could change the dataloader helpers to tokenize examples in batch and that would speed things up a lot. We'd at least need to change the code here: https://github.com/jscuds/rf-bert/blob/master/dataloaders/helpers.py#L58-L62