Skip to content

Use java.util.Locale in Workbench NL extensions handling#21

Closed
vogella wants to merge 1 commit into
masterfrom
icu-workbench-ulocale
Closed

Use java.util.Locale in Workbench NL extensions handling#21
vogella wants to merge 1 commit into
masterfrom
icu-workbench-ulocale

Conversation

@vogella
Copy link
Copy Markdown
Owner

@vogella vogella commented May 8, 2026

Replaces com.ibm.icu.util.ULocale usage in Workbench with java.util.Locale and drops the matching Import-Package: com.ibm.icu.util from the bundle manifest. The legacy ICU-style locale extension string from Platform.getNLExtensions() and the NL_EXTENSIONS preference (e.g. @calendar=hebrew;numbers=arab) is translated to a BCP 47 Unicode locale extension (-u-ca-hebrew-nu-arab) and applied via Locale.forLanguageTag, so existing preference values keep working for the common keyword names (calendar, collation, currency, numbers, timezone) as well as for two-letter BCP 47 keys. Unknown keys or ill-formed values are silently dropped, matching the previous best-effort ULocale behavior. This removes the last com.ibm.icu reference from org.eclipse.ui.workbench.

Opened against vogella/master for verification before submitting upstream.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the dependency on com.ibm.icu.util.ULocale with standard java.util.Locale for handling locale extensions in the Workbench. It introduces a translation layer to convert legacy ICU-style extension strings into BCP 47 Unicode locale extensions. Feedback indicates potential regressions regarding timezone mapping, as BCP 47 requires short CLDR identifiers which differ from long IANA IDs used in ICU. Additionally, it is recommended to replace underscores with hyphens in extension values to maintain compatibility with BCP 47 subtag requirements.

Comment on lines +1910 to +1915
private static final Map<String, String> ICU_TO_BCP47_KEY = Map.ofEntries(
Map.entry("calendar", "ca"), //$NON-NLS-1$ //$NON-NLS-2$
Map.entry("collation", "co"), //$NON-NLS-1$ //$NON-NLS-2$
Map.entry("currency", "cu"), //$NON-NLS-1$ //$NON-NLS-2$
Map.entry("numbers", "nu"), //$NON-NLS-1$ //$NON-NLS-2$
Map.entry("timezone", "tz")); //$NON-NLS-1$ //$NON-NLS-2$
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mapping of timezone to the BCP 47 key tz may result in a loss of functionality for many existing configurations.

Legacy ICU-style locale extensions often use long IANA timezone IDs (e.g., @timezone=America/New_York). However, BCP 47 Unicode locale extensions (the -u- section used by java.util.Locale) strictly require short CLDR identifiers (e.g., usnyc for America/New_York).

Since Locale.forLanguageTag is used here for a best-effort approach, values containing slashes or exceeding the 8-character subtag limit will be considered ill-formed and likely ignored by the JVM. This is a regression compared to the previous ULocale implementation which could resolve long IDs. If maintaining support for long timezone IDs is required, a manual mapping or a lookup via java.time.ZoneId might be necessary to find the corresponding BCP 47 short ID.

Comment on lines +1934 to +1952
for (String pair : body.split(";")) { //$NON-NLS-1$
int eq = pair.indexOf('=');
if (eq <= 0) {
continue;
}
String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);
String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT);
if (rawValue.isEmpty()) {
continue;
}
String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);
if (key == null) {
continue;
}
if (uExtension.length() > 0) {
uExtension.append('-');
}
uExtension.append(key).append('-').append(rawValue);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current parsing logic assumes that the rawValue from the ICU-style string is directly compatible with BCP 47 subtags.

ICU keywords often used underscores (e.g., calendar=islamic_civil), whereas BCP 47 Unicode extensions require hyphens and specific subtag lengths (2-8 alphanumeric characters). While Locale.forLanguageTag is lenient and will simply ignore invalid subtags, users might find their preferences silently failing if they contain underscores or other non-BCP 47 characters.

Consider replacing underscores with hyphens in the rawValue to improve compatibility with legacy ICU strings.

Suggested change
for (String pair : body.split(";")) { //$NON-NLS-1$
int eq = pair.indexOf('=');
if (eq <= 0) {
continue;
}
String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);
String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT);
if (rawValue.isEmpty()) {
continue;
}
String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);
if (key == null) {
continue;
}
if (uExtension.length() > 0) {
uExtension.append('-');
}
uExtension.append(key).append('-').append(rawValue);
}
for (String pair : body.split(";")) { //$NON-NLS-1$
int eq = pair.indexOf('=');
if (eq <= 0) {
continue;
}
String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);
String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT).replace('_', '-');
if (rawValue.isEmpty()) {
continue;
}
String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);
if (key == null) {
continue;
}
if (uExtension.length() > 0) {
uExtension.append('-');
}
uExtension.append(key).append('-').append(rawValue);
}

Replaces com.ibm.icu.util.ULocale usage in Workbench with java.util.Locale
and drops the matching Import-Package: com.ibm.icu.util from the bundle
manifest. The legacy ICU-style locale extension string from
Platform.getNLExtensions() and the NL_EXTENSIONS preference (e.g.
"@Calendar=hebrew;numbers=arab") is translated to a BCP 47 Unicode
locale extension ("-u-ca-hebrew-nu-arab") and applied via
Locale.forLanguageTag, so existing preference values keep working for
the common keyword names (calendar, collation, currency, numbers,
timezone) as well as for two-letter BCP 47 keys.

Removes the last com.ibm.icu reference from org.eclipse.ui.workbench.
@vogella vogella force-pushed the icu-workbench-ulocale branch from f4c0413 to 9751d8a Compare May 8, 2026 14:05
@vogella
Copy link
Copy Markdown
Owner Author

vogella commented May 8, 2026

Thanks. Applied:

  • Underscore→hyphen normalization in values, so legacy compound values like islamic_civil are accepted as islamic-civil.
  • For the timezone comment: the regression on long IANA IDs (America/New_York) is real, but mapping IANA→short CLDR IDs needs the CLDR table that lives in ICU and is not exposed by the JDK. Implementing a partial mapping ourselves would just shift the failure mode. I've documented the limitation in the helper's javadoc instead — short BCP 47 timezone IDs (@tz=usnyc) still work; long IANA IDs are dropped.

@vogella vogella closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant