Use java.util.Locale in Workbench NL extensions handling by vogella · Pull Request #21 · vogella/eclipse.platform.ui

vogella · 2026-05-08T14:00:50Z

Replaces com.ibm.icu.util.ULocale usage in Workbench with java.util.Locale and drops the matching Import-Package: com.ibm.icu.util from the bundle manifest. The legacy ICU-style locale extension string from Platform.getNLExtensions() and the NL_EXTENSIONS preference (e.g. @calendar=hebrew;numbers=arab) is translated to a BCP 47 Unicode locale extension (-u-ca-hebrew-nu-arab) and applied via Locale.forLanguageTag, so existing preference values keep working for the common keyword names (calendar, collation, currency, numbers, timezone) as well as for two-letter BCP 47 keys. Unknown keys or ill-formed values are silently dropped, matching the previous best-effort ULocale behavior. This removes the last com.ibm.icu reference from org.eclipse.ui.workbench.

Opened against vogella/master for verification before submitting upstream.

gemini-code-assist

Code Review

This pull request replaces the dependency on com.ibm.icu.util.ULocale with standard java.util.Locale for handling locale extensions in the Workbench. It introduces a translation layer to convert legacy ICU-style extension strings into BCP 47 Unicode locale extensions. Feedback indicates potential regressions regarding timezone mapping, as BCP 47 requires short CLDR identifiers which differ from long IANA IDs used in ICU. Additionally, it is recommended to replace underscores with hyphens in extension values to maintain compatibility with BCP 47 subtag requirements.

gemini-code-assist · 2026-05-08T14:02:16Z

+	private static final Map<String, String> ICU_TO_BCP47_KEY = Map.ofEntries(
+			Map.entry("calendar", "ca"), //$NON-NLS-1$ //$NON-NLS-2$
+			Map.entry("collation", "co"), //$NON-NLS-1$ //$NON-NLS-2$
+			Map.entry("currency", "cu"), //$NON-NLS-1$ //$NON-NLS-2$
+			Map.entry("numbers", "nu"), //$NON-NLS-1$ //$NON-NLS-2$
+			Map.entry("timezone", "tz")); //$NON-NLS-1$ //$NON-NLS-2$


The mapping of timezone to the BCP 47 key tz may result in a loss of functionality for many existing configurations.

Legacy ICU-style locale extensions often use long IANA timezone IDs (e.g., @timezone=America/New_York). However, BCP 47 Unicode locale extensions (the -u- section used by java.util.Locale) strictly require short CLDR identifiers (e.g., usnyc for America/New_York).

Since Locale.forLanguageTag is used here for a best-effort approach, values containing slashes or exceeding the 8-character subtag limit will be considered ill-formed and likely ignored by the JVM. This is a regression compared to the previous ULocale implementation which could resolve long IDs. If maintaining support for long timezone IDs is required, a manual mapping or a lookup via java.time.ZoneId might be necessary to find the corresponding BCP 47 short ID.

gemini-code-assist · 2026-05-08T14:02:17Z

+		for (String pair : body.split(";")) { //$NON-NLS-1$
+			int eq = pair.indexOf('=');
+			if (eq <= 0) {
+				continue;
+			}
+			String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);
+			String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT);
+			if (rawValue.isEmpty()) {
+				continue;
+			}
+			String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);
+			if (key == null) {
+				continue;
+			}
+			if (uExtension.length() > 0) {
+				uExtension.append('-');
+			}
+			uExtension.append(key).append('-').append(rawValue);
+		}


The current parsing logic assumes that the rawValue from the ICU-style string is directly compatible with BCP 47 subtags.

ICU keywords often used underscores (e.g., calendar=islamic_civil), whereas BCP 47 Unicode extensions require hyphens and specific subtag lengths (2-8 alphanumeric characters). While Locale.forLanguageTag is lenient and will simply ignore invalid subtags, users might find their preferences silently failing if they contain underscores or other non-BCP 47 characters.

Consider replacing underscores with hyphens in the rawValue to improve compatibility with legacy ICU strings.

Suggested change

for (String pair : body.split(";")) { //$NON-NLS-1$

int eq = pair.indexOf('=');

if (eq <= 0) {

continue;

}

String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);

String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT);

if (rawValue.isEmpty()) {

continue;

}

String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);

if (key == null) {

continue;

}

if (uExtension.length() > 0) {

uExtension.append('-');

}

uExtension.append(key).append('-').append(rawValue);

}

for (String pair : body.split(";")) { //$NON-NLS-1$

int eq = pair.indexOf('=');

if (eq <= 0) {

continue;

}

String rawKey = pair.substring(0, eq).trim().toLowerCase(Locale.ROOT);

String rawValue = pair.substring(eq + 1).trim().toLowerCase(Locale.ROOT).replace('_', '-');

if (rawValue.isEmpty()) {

continue;

}

String key = ICU_TO_BCP47_KEY.getOrDefault(rawKey, rawKey.length() == 2 ? rawKey : null);

if (key == null) {

continue;

}

if (uExtension.length() > 0) {

uExtension.append('-');

}

uExtension.append(key).append('-').append(rawValue);

}

@Calendar

Replaces com.ibm.icu.util.ULocale usage in Workbench with java.util.Locale and drops the matching Import-Package: com.ibm.icu.util from the bundle manifest. The legacy ICU-style locale extension string from Platform.getNLExtensions() and the NL_EXTENSIONS preference (e.g. "@Calendar=hebrew;numbers=arab") is translated to a BCP 47 Unicode locale extension ("-u-ca-hebrew-nu-arab") and applied via Locale.forLanguageTag, so existing preference values keep working for the common keyword names (calendar, collation, currency, numbers, timezone) as well as for two-letter BCP 47 keys. Removes the last com.ibm.icu reference from org.eclipse.ui.workbench.

vogella · 2026-05-08T14:05:20Z

Thanks. Applied:

Underscore→hyphen normalization in values, so legacy compound values like islamic_civil are accepted as islamic-civil.
For the timezone comment: the regression on long IANA IDs (America/New_York) is real, but mapping IANA→short CLDR IDs needs the CLDR table that lives in ICU and is not exposed by the JDK. Implementing a partial mapping ourselves would just shift the failure mode. I've documented the limitation in the helper's javadoc instead — short BCP 47 timezone IDs (@tz=usnyc) still work; long IANA IDs are dropped.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

vogella force-pushed the icu-workbench-ulocale branch from f4c0413 to 9751d8a Compare May 8, 2026 14:05

vogella closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use java.util.Locale in Workbench NL extensions handling#21

Use java.util.Locale in Workbench NL extensions handling#21
vogella wants to merge 1 commit into
masterfrom
icu-workbench-ulocale

vogella commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

vogella commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vogella commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

vogella commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant