Skip to content

Non-integer categorical labels produce error. #1

@jcausey-astate

Description

@jcausey-astate

When using the image_dataset_from_dataframe() function, which calls the image_dataset_from_paths_and_labels() function, given a dataframe whose label column contains string categorical labels (e.g. ['apple', 'orange', 'kiwi']), using the "categorical" label_mode, the following error results:

UFuncTypeError: ufunc 'maximum' did not contain a loop with signature matching types (dtype('<U25'), dtype('<U25')) -> None

This traces back to:

File working_dir.venv/lib/python3.10/site-packages/imflow/imflow.py:549, in image_dataset_from_paths_and_labels(image_paths, labels, label_mode, color_mode, batch_size, image_size, shuffle, seed, validation_split, subset, interpolation, resize_with_pad)
    547    num_classes = 2
    548  if label_mode in ('int', 'categorical'):
--> 549    num_classes = np.max(labels) + 1
    550  if label_mode == 'multi_class':
    551    num_classes = labels.shape[1]

where np.max(labels) is being called, but labels contains strings.

Maybe doing a separate case for string 'categorical' labels, with e.g. num_classes = np.unique(labels).shape[0] would solve this?

P.S. It also looks like you have an assertion that would not allow multi_class as a label type (line 491):

if label_mode not in {'int', 'categorical', 'multi_label', 'binary', None}:

but parts of the code (line 542 if label_mode in ('multi_class', 'multi_label'):) seem to allow for it.
(If you like, I could make this a separate issue for tracking.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions