Based on the code sample, the sum of all confidences can well be >1:
[
{ id: "602", confidence: 0.98 }, // Represents "Consumer Electronics"
{ id: "597", confidence: 0.95 }, // Represents "Technology & Computing"
{ id: "45", confidence: 0.82 } // Represents "Automotive"
]
This is different from what LanguageDetector does:
The values of rawResult, plus unknown, must sum to 1. Each such value, or unknown, may be 0.
If the implementation believes input to be written in multiple languages, then it should attempt to apportion the values of rawResult and unknown such that they are proportionate to the amount of input written in each detected language.
I suggest we align Taxonomizer's and LanguageDetector's behavior, and hence make Taxonomizer's confidences sum 1.
Based on the code sample, the sum of all
confidences can well be>1:[ { id: "602", confidence: 0.98 }, // Represents "Consumer Electronics" { id: "597", confidence: 0.95 }, // Represents "Technology & Computing" { id: "45", confidence: 0.82 } // Represents "Automotive" ]This is different from what
LanguageDetectordoes:I suggest we align
Taxonomizer's andLanguageDetector's behavior, and hence makeTaxonomizer'sconfidences sum1.