Use java.util.Locale in Workbench NL extensions handling#21
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the dependency on com.ibm.icu.util.ULocale with standard java.util.Locale for handling locale extensions in the Workbench. It introduces a translation layer to convert legacy ICU-style extension strings into BCP 47 Unicode locale extensions. Feedback indicates potential regressions regarding timezone mapping, as BCP 47 requires short CLDR identifiers which differ from long IANA IDs used in ICU. Additionally, it is recommended to replace underscores with hyphens in extension values to maintain compatibility with BCP 47 subtag requirements.
| private static final Map<String, String> ICU_TO_BCP47_KEY = Map.ofEntries( | ||
| Map.entry("calendar", "ca"), //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Map.entry("collation", "co"), //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Map.entry("currency", "cu"), //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Map.entry("numbers", "nu"), //$NON-NLS-1$ //$NON-NLS-2$ | ||
| Map.entry("timezone", "tz")); //$NON-NLS-1$ //$NON-NLS-2$ |
There was a problem hiding this comment.
The mapping of timezone to the BCP 47 key tz may result in a loss of functionality for many existing configurations.
Legacy ICU-style locale extensions often use long IANA timezone IDs (e.g., @timezone=America/New_York). However, BCP 47 Unicode locale extensions (the -u- section used by java.util.Locale) strictly require short CLDR identifiers (e.g., usnyc for America/New_York).
Since Locale.forLanguageTag is used here for a best-effort approach, values containing slashes or exceeding the 8-character subtag limit will be considered ill-formed and likely ignored by the JVM. This is a regression compared to the previous ULocale implementation which could resolve long IDs. If maintaining support for long timezone IDs is required, a manual mapping or a lookup via java.time.ZoneId might be necessary to find the corresponding BCP 47 short ID.
| for (String pair : body.split(";")) { //$NON-NLS-1$ | ||
| int eq = pair.indexOf('='); | ||
| if (eq <= 0) { | ||
| continue; | ||
| } | ||
| String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT); | ||
| String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT); | ||
| if (rawValue.isEmpty()) { | ||
| continue; | ||
| } | ||
| String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null); | ||
| if (key == null) { | ||
| continue; | ||
| } | ||
| if (uExtension.length() > 0) { | ||
| uExtension.append('-'); | ||
| } | ||
| uExtension.append(key).append('-').append(rawValue); | ||
| } |
There was a problem hiding this comment.
The current parsing logic assumes that the rawValue from the ICU-style string is directly compatible with BCP 47 subtags.
ICU keywords often used underscores (e.g., calendar=islamic_civil), whereas BCP 47 Unicode extensions require hyphens and specific subtag lengths (2-8 alphanumeric characters). While Locale.forLanguageTag is lenient and will simply ignore invalid subtags, users might find their preferences silently failing if they contain underscores or other non-BCP 47 characters.
Consider replacing underscores with hyphens in the rawValue to improve compatibility with legacy ICU strings.
| for (String pair : body.split(";")) { //$NON-NLS-1$ | |
| int eq = pair.indexOf('='); | |
| if (eq <= 0) { | |
| continue; | |
| } | |
| String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT); | |
| String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT); | |
| if (rawValue.isEmpty()) { | |
| continue; | |
| } | |
| String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null); | |
| if (key == null) { | |
| continue; | |
| } | |
| if (uExtension.length() > 0) { | |
| uExtension.append('-'); | |
| } | |
| uExtension.append(key).append('-').append(rawValue); | |
| } | |
| for (String pair : body.split(";")) { //$NON-NLS-1$ | |
| int eq = pair.indexOf('='); | |
| if (eq <= 0) { | |
| continue; | |
| } | |
| String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT); | |
| String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT).replace('_', '-'); | |
| if (rawValue.isEmpty()) { | |
| continue; | |
| } | |
| String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null); | |
| if (key == null) { | |
| continue; | |
| } | |
| if (uExtension.length() > 0) { | |
| uExtension.append('-'); | |
| } | |
| uExtension.append(key).append('-').append(rawValue); | |
| } |
Replaces com.ibm.icu.util.ULocale usage in Workbench with java.util.Locale and drops the matching Import-Package: com.ibm.icu.util from the bundle manifest. The legacy ICU-style locale extension string from Platform.getNLExtensions() and the NL_EXTENSIONS preference (e.g. "@Calendar=hebrew;numbers=arab") is translated to a BCP 47 Unicode locale extension ("-u-ca-hebrew-nu-arab") and applied via Locale.forLanguageTag, so existing preference values keep working for the common keyword names (calendar, collation, currency, numbers, timezone) as well as for two-letter BCP 47 keys. Removes the last com.ibm.icu reference from org.eclipse.ui.workbench.
f4c0413 to
9751d8a
Compare
|
Thanks. Applied:
|
Replaces
com.ibm.icu.util.ULocaleusage inWorkbenchwithjava.util.Localeand drops the matchingImport-Package: com.ibm.icu.utilfrom the bundle manifest. The legacy ICU-style locale extension string fromPlatform.getNLExtensions()and theNL_EXTENSIONSpreference (e.g.@calendar=hebrew;numbers=arab) is translated to a BCP 47 Unicode locale extension (-u-ca-hebrew-nu-arab) and applied viaLocale.forLanguageTag, so existing preference values keep working for the common keyword names (calendar, collation, currency, numbers, timezone) as well as for two-letter BCP 47 keys. Unknown keys or ill-formed values are silently dropped, matching the previous best-effortULocalebehavior. This removes the lastcom.ibm.icureference fromorg.eclipse.ui.workbench.Opened against
vogella/masterfor verification before submitting upstream.