-
Notifications
You must be signed in to change notification settings - Fork 0
Description
When using the image_dataset_from_dataframe() function, which calls the image_dataset_from_paths_and_labels() function, given a dataframe whose label column contains string categorical labels (e.g. ['apple', 'orange', 'kiwi']), using the "categorical" label_mode, the following error results:
UFuncTypeError: ufunc 'maximum' did not contain a loop with signature matching types (dtype('<U25'), dtype('<U25')) -> None
This traces back to:
File working_dir.venv/lib/python3.10/site-packages/imflow/imflow.py:549, in image_dataset_from_paths_and_labels(image_paths, labels, label_mode, color_mode, batch_size, image_size, shuffle, seed, validation_split, subset, interpolation, resize_with_pad)
547 num_classes = 2
548 if label_mode in ('int', 'categorical'):
--> 549 num_classes = np.max(labels) + 1
550 if label_mode == 'multi_class':
551 num_classes = labels.shape[1]
where np.max(labels) is being called, but labels contains strings.
Maybe doing a separate case for string 'categorical' labels, with e.g. num_classes = np.unique(labels).shape[0] would solve this?
P.S. It also looks like you have an assertion that would not allow multi_class as a label type (line 491):
if label_mode not in {'int', 'categorical', 'multi_label', 'binary', None}:
but parts of the code (line 542 if label_mode in ('multi_class', 'multi_label'):) seem to allow for it.
(If you like, I could make this a separate issue for tracking.)