Skip to content

char.md is broadly out of sync with the compiler: nonexistent types, nonexistent operators, undocumented behavior; full breakdown by category #19

@davidfrogley

Description

@davidfrogley

Summary

Web/src/docs/types/char.md describes a char-handling story that disagrees with the compiler in roughly a dozen specific places. The defects fall into three categories with different fix paths:

  • Pure doc bugs — compiler is internally consistent and intentional; doc is the outlier. Fix in this repo, no input needed elsewhere.
  • Roadmap-dependent — doc describes coherent features that don't exist anywhere in the codebase. A maintainer needs to decide implement vs. delete before docs can move.
  • Tracked as compiler bugs — doc is correct; compiler is the outlier. Filed separately under rux-lang/Rux; this issue links to them so a doc-fixer doesn't accidentally rewrite text that's about to become true.

Category 1 — pure doc bugs (fixable here)

1a. Nonexistent char types: char64, char128, char256, char512

TypeRef::Kind (Rux/Include/Rux/Type.h:17-50) defines exactly Char8, Char16, Char32, with Char = Char32 as alias. No wider char types exist in the compiler. Type-name resolution at Hir.cpp:687-692 and Sema.cpp:1008-1011 accepts only char, char8, char16, char32.

Lines affected:

  • L8 — overview alludes to "extended private-use and future Unicode planes."
  • L19-27 — types table; rows for char64, char128, char256, char512.
  • L65-82 — entire ### char64, char128, char256, char512 section.
  • L77-79 — example let raw = 0x0001f600 as char64;.
  • L94 — "use the qualified type names (char8, char16, etc.)" — etc. implies wider types exist.
  • L213-215 — surrogates note about "wider types (char64 and above)."
  • L267-268 — recommendations bullet about avoiding char64char512.
  • L283-284 — FFI note about "wider types have no standard C equivalent."

Fix: trim to char8 / char16 / char32 / char throughout. If extended-width chars are on the roadmap, gate behind a "Planned" callout instead.

1b. String interpolation "{c}" (L187)

The line:

let t: String = "{c}";           // "€"  (interpolation)

is wrong on three independent points:

  1. Bare "…" literals are typed Slice, not String (Hir.cpp:386-394, Sema.cpp:763-768). Assignment errors with cannot assign 'Slice' to 'String'.
  2. There is no string interpolation in the language. The lexer doesn't scan for { inside strings, the parser has no interpolation segment branch, and no InterpolatedString AST node exists. "{c}" lexes as the literal three-character string {, c, }.
  3. Even if interpolation existed, the surrounding example at L186 calls String.From(c) where c: char32, but Std/Src/String.rux:33-44 only provides From(*const char8, uint) and From(char8[]) — no char overload.

A grep across Web/src/docs/ shows L187 is the only place rendered docs use "{ident}"-style syntax, so cleanup is narrow.

Fix: delete L187 entirely. Possibly also rewrite L181-186 to point users at whatever the supported "char to one-char String" path actually is today (see Category 2d for the underlying gap).

1c. Typo at L156

"the program trow exception"

Should read "throws an exception." Incidental; fold into whatever PR addresses L153-162 (which is itself bucket-2; see below).

Category 2 — roadmap-dependent (needs maintainer decision)

These describe coherent features that simply don't exist. Each needs a call: implement, or delete from the docs.

2a. The as? operator (L161, L177, L252-258, L269)

Rux/Source/Parser.cpp:1364-1385 parses cast expressions and accepts only as and is. There is no Question-token branch after the AsKeyword match. x as? char16 lexes to x, as, char16, ?, and the trailing ? fails to parse.

2b. Nullable type syntax T? (L161)

? is lexed as TokenKind::Question (Lexer.cpp:552) and used in exactly one place: the ternary at Parser.cpp:1237. No type parser admits a postfix ?, and no Optional / Nullable variant exists in TypeRef::Kind (Type.h:17-50).

If 2a and 2b were both implemented, the existing null literal still wouldn't fit T? for value types — null is currently the C-style null pointer literal, assignable only to pointer types via the special-case rule at Sema.cpp:99-101.

2c. let s: String = "..." pattern (L186, plus likely elsewhere)

Bare string literals are Slice<char8>. Assigning one to a String binding errors. Either IsAssignableTo needs a Slice<char8>String coercion rule, or the docs need to stop using this pattern. (A grep would show how widespread the misuse is — worth a sweep before the fix lands.)

2d. String::From(charN) overload (L186)

Std/Src/String.rux:33-44 provides only the two existing From overloads. A From(char32) (or From(charN) for each width) overload would need to be added before the doc example can stand. This is a stdlib gap, not a compiler gap — could be filed against rux-lang/Std if the maintainer wants to implement.

Per-feature call needed: for each of 2a-2d, either

  • implement and leave doc text in place,
  • gate behind a "Planned" callout linking a tracking issue, or
  • delete the doc text entirely.

I'd default to (c) for 2a/2b (large language features, no infrastructure in place) and (a) for 2c/2d (small stdlib/compiler tweaks that would close real ergonomic gaps). But that's a maintainer call.

Category 3 — compiler bugs (filed separately, do not rewrite the doc)

The doc is correct on these points; the compiler doesn't match. Filed under rux-lang/Rux:

3a. Char widening is not implicit (#)

char.md:143-151 correctly describes implicit widening for char8char16char32. Sema.cpp:81-103 is missing the rule. Filed at rux-lang/Rux#<widening-issue> (placeholder). Don't edit L143-151; they're the spec.

3b. Cast validation skips range / surrogate / runtime panic (#)

char.md:32-39, :153-162, :175-177, :249-258 all describe validation behavior on as casts (constant out-of-range error, runtime panic, surrogate rejection). The compiler does none of it. Filed at rux-lang/Rux#<cast-validation-issue> (placeholder). Don't rewrite these sections; the maintainer has a design call to make on that issue about whether docs or compiler should win.

Category 4 — bucket awaiting Category 3b's resolution

4a. let a: char8 = 'A'; doesn't compile (L36, L48-50, L130-138)

Unprefixed char literals are always char32 (Hir.cpp:396-401, Sema.cpp:771-776). The "minimum-width inference" claim at L130-138 doesn't match implementation; the L36 example fails with type mismatch.

This isn't filed as a separate compiler issue because it's directly entangled with 3b's design call:

  • If the language adopts context-driven coercion or minimum-width inference, L36 / L130-138 are correct as-is and the compiler needs the fix.
  • If the language insists on prefixed literals (c8'A', c16'字'), L36 / L48-50 / L130-138 all need rewriting to use prefixed forms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions