diff --git a/docs/adr/0003-shared-self-concurrent-dispatch.md b/docs/adr/0003-shared-self-concurrent-dispatch.md index 67cff7a..e5b4d43 100644 --- a/docs/adr/0003-shared-self-concurrent-dispatch.md +++ b/docs/adr/0003-shared-self-concurrent-dispatch.md @@ -56,3 +56,63 @@ read-loop for its full duration. This is correct per the LSP spec — clients are not permitted to send other requests until `initialize` returns — and it surfaces the latency on the handler author rather than hiding it behind a spawn boundary. + +## Addendum (2026-06-15): document-sync mutations are inline too + +The first addendum claimed every feature method, "hover, completion, +didOpen, …", gains nothing from ordered execution. That is wrong for the +document-sync notifications. The built-in state mutation for +`textDocument/didOpen`, `didChange`, `didClose`, and `didSave` runs +**inline** in the read-loop, in receipt order, so the mutation lands in +the [[Documents]] store before the next message is dispatched. The +inline category is therefore lifecycle **plus** document-sync mutation, +not lifecycle alone. Requests still spawn (ADR 0012), and user-side +reactions to a change (publishing diagnostics, reparsing) spawn too — +only the built-in rope mutation is inline. + +This is the ordering constraint the LSP spec leaves implicit. The spec +(3.18, *Request, Notification and Response Ordering*) lets a server +reorder messages "as long as this reordering doesn't affect the +correctness of the responses." A reordered `didChange` crosses that +line: incremental edits applied out of order corrupt the rope, and a +later request then reads a broken document. + +We rejected **spawning the mutation** (the model lspf shipped through +0.1.0-alpha.2, and tower-lsp's default). `tokio::spawn` hands the +mutation to the scheduler with no ordering guarantee relative to the next +`didChange` or a following request, so two edits can land out of order. +tower-lsp's maintainer (issue #284) and async-lsp both classify this as a +spec violation — the server and client states "slowly drift apart." + +We rejected a **per-document serial queue** (`HashMap`, FIFO +per URI, parallel across URIs). It is more machinery than either +reference implementation uses, and it optimizes the wrong cost: a rope +mutation is microseconds, so running it inline does not meaningfully +stall the read-loop, while the expensive work (reparse, diagnostics) is +user-side and spawns in *every* design. Its one distinctive benefit — a +slow mutation on document A not blocking document B — buys almost nothing +because the mutation is never the slow part, and to preserve +read-after-write it would force same-document requests to serialize +behind the queue, discarding most of the request concurrency the spawn +model exists for. + +Both canonical implementations reach the same guarantee by different +mechanisms. Microsoft's `vscode-jsonrpc` is concurrent-by-default at the +connection layer (`maxParallelism` defaults to `-1`; it dispatches the +next queued message without awaiting the current handler), yet document +sync stays correct because the built-in `TextDocuments` listener is +**synchronous** — on a single-threaded event loop the mutation completes +within the synchronous prefix of dispatch, before the next message is +touched. async-lsp states the rule directly: "notifications must be +processed in order (synchronously) … requests can be processed +concurrently." Running the mutation inline in our read-loop is that same +guarantee expressed for a multi-threaded tokio runtime, where `spawn` +would otherwise throw it away. + +The cost we accept: the read-loop is blocked for the duration of a +built-in document-sync mutation. This is bounded and cheap by +construction (the built-in does no user work), and it mirrors the slow- +`initialize` cost above. A user who overrides a document-sync handler +with an expensive body reintroduces the stall; the built-ins keep their +reaction work (diagnostics) on a spawned task, and overrides are +documented to do the same. diff --git a/docs/adr/0004-capability-auto-derivation.md b/docs/adr/0004-capability-auto-derivation.md index f90b1b4..314cade 100644 --- a/docs/adr/0004-capability-auto-derivation.md +++ b/docs/adr/0004-capability-auto-derivation.md @@ -21,3 +21,8 @@ implementing `hover`, or implement `hover` while leaving `HOVER = false`. Neither crashes; the client either calls a default no-op or never calls at all. We treat this as a documentation and example-quality problem, not a soundness one. + +One capability escapes this const model: `positionEncoding` is +*negotiated* from the client's `general.positionEncodings`, not declared +by a const, so the capability-assembly step must see `InitializeParams`. +See ADR 0016. diff --git a/docs/adr/0016-position-encoding.md b/docs/adr/0016-position-encoding.md new file mode 100644 index 0000000..b465d64 --- /dev/null +++ b/docs/adr/0016-position-encoding.md @@ -0,0 +1,58 @@ +# Position encoding: negotiate UTF-8, default UTF-16 + +LSP `Position.character` is not a character index and not a byte offset — +its meaning is the *negotiated* `PositionEncodingKind` (LSP 3.18, +*Position*). The only mandatory encoding is UTF-16; a client may also +offer UTF-8 and/or UTF-32 through the `general.positionEncodings` client +capability, and the server echoes its pick back in +`InitializeResult.capabilities.positionEncoding`. If the server omits it, +the encoding is UTF-16. + +lspf negotiates **UTF-8 when the client offers it, falling back to +UTF-16** otherwise. The negotiated kind is stored as runtime state on the +[[Documents]] store and governs every later `position ↔ offset` +conversion. This is rust-analyzer's strategy. + +We chose UTF-8-preferred for performance (ADR 0005). The [[Document]]'s +`ropey::Rope` and every Rust `&str` index natively in **bytes**; with +UTF-8 negotiated, `Position.character` *is* the byte offset within the +line, so the hot path — the conversion done on every hover, completion, +diagnostic range, and incremental `didChange` — is a direct rope byte +index with no transcoding. This matters most over the WebSocket +transport, where ADR 0005 already counts redundant work as latency. + +We rejected **UTF-16 only** (advertise nothing, default to UTF-16). It is +the smaller surface — one conversion path, correct for every client — but +`ropey` indexes in chars and bytes, never UTF-16 code units, so *every* +position would pay a `char ↔ utf16` transcode even when talking to a +client (rust-analyzer's own ecosystem, most modern editors) that would +happily speak UTF-8. We keep the UTF-16 path as the mandatory fallback, +not the default. + +We rejected **treating `Position.character` as a `ropey` char index** +(the naive reading). It is wrong under *both* encodings: a line +containing any non-ASCII text (an emoji, a CJK character, a combining +mark) has a different char count than its UTF-16 code-unit count and its +UTF-8 byte count. This is the most common silent correctness bug in LSP +servers, and naming it here is half the reason this ADR exists. + +This is the one capability that escapes ADR 0004's static-const +auto-derivation. Every other capability is a compile-time `const` on the +trait; `positionEncoding` cannot be, because the server's choice depends +on the *client's* offered list, which only arrives in `InitializeParams`. +Two consequences follow: + +1. The `InitializeResult` builder must **see the client capabilities** — + the capability-assembly step takes `InitializeParams`, not just + `&self`, so it can intersect the client's `general.positionEncodings` + with lspf's preference order (`[utf-8, utf-16]`). +2. The negotiated kind is **runtime state**, not a const. The + [[Documents]] store holds it and every conversion method reads it; a + handler never has to thread the encoding through itself. + +The cost we accept: two conversion code paths (UTF-8 native, UTF-16 +transcoded) plus the negotiation logic, and a `positionEncoding` that +sits outside the otherwise-uniform const-derived capability model. We +treat the asymmetry as honest — position encoding genuinely *is* +negotiated rather than declared, and pretending otherwise would either +break correctness or forfeit the UTF-8 fast path.