Skip to content

Taxonomizer's confidence works differently than LanguageDetector's #2

@tomayac

Description

@tomayac

Based on the code sample, the sum of all confidences can well be >1:

[
    { id: "602", confidence: 0.98 }, // Represents "Consumer Electronics"
    { id: "597", confidence: 0.95 }, // Represents "Technology & Computing"
    { id: "45",  confidence: 0.82 }  // Represents "Automotive" 
]

This is different from what LanguageDetector does:

The values of rawResult, plus unknown, must sum to 1. Each such value, or unknown, may be 0.

If the implementation believes input to be written in multiple languages, then it should attempt to apportion the values of rawResult and unknown such that they are proportionate to the amount of input written in each detected language.

Image

I suggest we align Taxonomizer's and LanguageDetector's behavior, and hence make Taxonomizer's confidences sum 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions