Skip to content

Add French (fr-fr) language support#16

Open
toto-polo wants to merge 14 commits intoStypox:masterfrom
toto-polo:master
Open

Add French (fr-fr) language support#16
toto-polo wants to merge 14 commits intoStypox:masterfrom
toto-polo:master

Conversation

@toto-polo
Copy link
Copy Markdown

Add French (fr-fr) language support

This PR adds complete French language support following the same structure
as the existing Italian (it-it) implementation.

Tools used

Claude Code Pro for implementation
Google Gemini in Android Studio to verify code compilation

Files added

Resources (config/fr-fr/):

  • tokenizer.json — Tokenizer with all French number words (0–99 including
    the irregular 70–79 and 80–99 forms), ordinals, months, weekdays, duration
    words, and date/time tokens
  • date_time.json — Date/time formatting with French-specific decade_format
    rules covering soixante-dix (70), quatre-vingts (80), etc.
  • date_time_test.json — Year/date/date-time formatting tests
  • *.word files — Translations for duration unit words (seconde, minute,
    heure, jour and plurals)

Source (lang/fr/):

  • FrenchFormatter.kt — Number pronouncer implementing French-specific
    subHundred logic (soixante-dix, quatre-vingts) and ordinal suffixes (-ième,
    with special cases for cinq→cinquième, neuf→neuvième, etc.)
  • FrenchNumberExtractor.kt — Number parser with custom numberLessThan1000Fr
    handling space-separated forms like "soixante dix-sept" (70–79) and
    "quatre vingt" (80+)
  • FrenchDateTimeExtractor.kt — Date/time parser including "moins le quart"
    (quarter to) support via special_minute_before
  • FrenchParser.kt — Top-level parser wiring the above components

Modified:

  • ParserFormatterBuilder.kt — Registered "fr" locale

Key implementation notes

  • French 70–79: "soixante" (60) + a teen (10–19). Both hyphenated tokens
    (e.g. soixante-dix-sept) and space-separated forms are handled.
  • French 80–99: "quatre" (4) × "vingt" (20) ± remainder. The tokenizer
    provides pre-built hyphenated forms; the extractor handles the
    space-separated case.
  • Ordinals: strip trailing "s" (quatre-vingts → quatre-vingtième), then
    apply -ième with phonetic adjustments (cinq+u, neuf→v, drop final e).
  • Duration formatting uses "un" regardless of grammatical gender (same
    approach as Italian).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant