Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ development sessions. Solutions are reusable; specs are per-feature.

- [lbug COPY FROM (subquery) bulk-load pattern](solutions/conventions/lbug-copy-from-subquery-bulk-load.md) — type-safe bulk inserts via COPY subquery; sentinel row, STRING[] never null + sentinel STRING[] never empty (LIST(ANY) trap), maxDBSize cap (8 TiB default exhausts VA), readOnly cannot run CREATE_FTS_INDEX, src/dst not from/to, eid not id alias.

- [Hand-rolled binary readers must bounds-check `pos += len`](solutions/conventions/binary-reader-bounds-check-pos-plus-len.md) — `Uint8Array.subarray` clamps silently and turns truncated input into valid-but-empty output. Every length-delimited read site needs an explicit `if (pos + len > end) throw` before the advance. Generalizes to MessagePack / CBOR / any custom wire format.

- [Doctor-style probes drift after rip-and-replace](solutions/best-practices/doctor-probe-drift-after-rip-and-replace.md) — dev `node_modules` is hot from prior installs, so a probe for a removed package keeps returning `ok` against the workspace and `fail` against the published CLI for months. At rip time, sweep `doctor.ts`, `doctor.test.ts`, `mise.toml`, CI matrix branches, and `--skip-X` flags for the constellation; tighten test assertions from `notEqual(status, "fail")` to `equal(status, "ok")`.

## Specs

- [001-scip-replaces-lsp](specs/001-scip-replaces-lsp/spec.md) — rip-and-replace LSP with SCIP for TS/Py/Go/Rust/Java. Task map: [tasks.md](specs/001-scip-replaces-lsp/tasks.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
name: doctor-probe-drift-after-rip-and-replace
description: After a rip-and-replace that removes a dependency from the published install graph, `doctor`-style health probes for that dependency keep returning "ok" against the dev workspace and "fail" against the shipped CLI for months. Dev `node_modules` masks the drift; clean-checkout CI doesn't catch it because the probe's test asserts the workspace shape.
metadata:
type: best-practice
category: best-practices
tags: [rip-and-replace, doctor, install-graph, drift-detection, ci]
discovered: 2026-05-28
session: session-88b46e
related:
- squash-merge-masks-pre-existing-debt
- dogfood-prepush-hook-caught-cli-spec-mismatch
---

# Doctor-style probes drift after rip-and-replace

## The pattern

A rip-and-replace lands (e.g. ADR 0015 native→WASM tree-sitter cutover in 0.4.0). The published CLI tarball stops shipping `tree-sitter` and `tree-sitter-typescript` as deps. **But the `doctor` command's `treeSitterNativeCheck` keeps probing for them**, and its test asserts the probe doesn't return `status: "fail"`.

Locally, dev `node_modules` is hot from prior installs — the probe finds the packages, returns `ok`, the test passes, CI is green. **A clean `pnpm install --frozen-lockfile` would have turned the test red, but no scheduled job runs that.** The bug ships, the published CLI returns `fail` on every user's machine with a misleading hint to "re-run pnpm install to rebuild native bindings (requires clang/g++)".

This survived for 4+ months between the 0.4.0 cutover and the 2026-05-28 audit that caught it.

## Why this is hard to catch

Three layers of false-negatives stacked:

1. **Probe under-tested** — the test asserted `notEqual(status, "fail")`, which permits `ok`, `warn`, and other shapes. It didn't assert the probe was registered for the right reason.
2. **Test runs against the dev workspace, not the published shape** — `pnpm` strict-isolation hides workspace deps from packages that don't directly declare them, except when running tests, which use the resolved-from-disk graph. So the probe finds packages a user wouldn't.
3. **No drift sweep on rip-and-replace** — ADRs deprecate code, but no checklist enforces sweeping every consumer (in this case `doctor.ts` and `doctor.test.ts`) at the same time.

## How to apply

1. **At rip time**, grep the codebase for the removed package name AND for the `resolveFromRoot` / `createRequire` patterns that probed it. Every hit must either be deleted or updated to probe whatever replaced it.
2. **Add a clean-checkout doctor smoke** to the `verify-global-install` matrix. This is what the published CLI does on a user box. If it diverges from dev `pnpm test`, that's the canary.
3. **Tighten doctor probe assertions** — `assertEqual(status, "ok")` is the right shape, not `assertNotEqual(status, "fail")`. The latter passes on `warn`, on `undefined`, and on a probe that was never registered.
4. **Maintain a "doctor invariants" audit** — when the install graph changes (deps added/removed, workspace topology changes), open a `doctor.ts` review. Same triage you'd run for a public API change.

## Detection signal at audit time

What surfaced this in the 2026-05-28 audit was reading **the test docstring** rather than the test body. The block at `doctor.test.ts:163-170` explicitly says "the `repoRoot` walk-four-dirs-up heuristic yields a path that doesn't contain the packages, but `createRequire(import.meta.url)` does" — that comment is a half-confession that the test depends on the CLI's own resolution context. Once you read it, the latent failure is obvious. Add a checklist item: "any test whose docstring describes a fragile resolution context is a drift candidate."

## Generalization beyond doctor

The same pattern applies to:

- **`postinstall` scripts** that probe for or rebuild native bindings.
- **`mise.toml` tool comments** ("required to build X" — when X is gone, the comment lies).
- **CI matrix branches** keyed on legacy capability flags (e.g. `OCH_NATIVE_PARSER` env in `.github/workflows/ci.yml` — also dead post-rip).
- **Help strings and `--skip-X` flags** that name removed capabilities.

Every rip-and-replace produces a small constellation of these. Treat the rip as incomplete until the constellation is gone.

## Linked

- [[squash-merge-masks-pre-existing-debt]] — same family (CI green doesn't mean code clean).
- [[dogfood-prepush-hook-caught-cli-spec-mismatch]] — example of a self-targeting check that catches drift.
- PR #138 — the four-site fix for the 0.4.0 native→WASM tree-sitter rip.
- ADR 0015 — the rip itself.
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
name: binary-reader-bounds-check-pos-plus-len
description: Hand-rolled binary readers (protobuf, MessagePack, CBOR, custom wire formats) must throw, not clamp, when a length-prefixed read would overrun the buffer. `Uint8Array.subarray` clamps silently and turns truncated input into valid-but-empty output.
metadata:
type: convention
category: conventions
tags: [binary-format, parser, protobuf, scip, defensive-programming]
discovered: 2026-05-28
session: session-88b46e
related:
- scip-protobuf-hand-rolled-reader
---

# Hand-rolled binary readers must bounds-check `pos += len`

When a hand-rolled reader for a length-prefixed wire format (protobuf, MessagePack, CBOR, custom packed records) does:

```ts
readString(): string {
const len = this.readVarint();
const start = this.pos;
this.pos += len;
return new TextDecoder().decode(this.buf.subarray(start, start + len));
}
```

…and the buffer is truncated mid-message, `subarray` **silently clamps** to `byteLength`. `pos` overshoots `end`. The next `readVarint` either:
- throws "unexpected end of buffer in varint" (if a parent caller tries to read another field), or
- silently exits because `while (this.pos < this.end)` is already false at a message boundary.

The second case is the dangerous one: a **truncated input decodes as a valid-but-empty container** and every downstream consumer treats "no records" as "nothing to do" and proceeds without error.

## The rule

Every site that advances `this.pos += len` for a variable-length read MUST bounds-check. In the SCIP proto-reader this was four sites:

```ts
// readString, readSubMessage, skip(WIRE_LENGTH_DELIMITED), readRepeatedInt32 packed branch
if (start + len > this.end) {
throw new Error("scip-ingest: unexpected end of buffer in <op>");
}
this.pos += len;
```

Don't try to "recover" from truncation by clamping or returning empty — the caller has no signal that the stream was incomplete. Throw, and let the caller decide whether the input is salvageable.

## Why

`Uint8Array.subarray(start, end)` follows the spec's "ToInteger" then "min(byteLength, end)" semantics. There is no way to opt out of the clamp short of explicit pre-checks. `Buffer.subarray` is the same. `DataView.getX` throws on overrun, but the typed-array views don't.

Detection from outside is hard: round-trip tests use clean fixtures, structural tests check decoded shapes, but neither produces a deliberately truncated buffer. The bug class only surfaces when a real user gets a partial download / interrupted write / corrupt cache.

## How to apply

1. When writing a new hand-rolled binary reader: every length-delimited read site needs an explicit `if (pos + len > end) throw` before the advance. Don't trust the underlying `subarray` to surface the error.
2. When reviewing one: search for `pos += len`, `pos += <varint>`, `pos += <length>` patterns. Each site needs a bounds check on the same line.
3. When testing one: add at least one positive control (healthy decode succeeds) AND one truncation case per reader method (declared length runs past buffer end → throws). The healthy-path test alone passes even with the bug present.
4. When generalizing: this applies to MessagePack `readBin`/`readStr`, CBOR major-type-2/3 readers, custom wire formats — any decoder that copies "count next N bytes" into an output without first checking `available >= N`.

## SCIP-specific occurrences

The pattern fired four times in `packages/scip-ingest/src/proto-reader.ts`:
- `readString` — most common; every symbol name, every doc string.
- `readSubMessage` — the entry point for nested protobuf messages; affects every `Document`, `Occurrence`, `SymbolInformation`.
- `skip(WIRE_LENGTH_DELIMITED)` — every unknown field forwarded by `forEachField`.
- `readRepeatedInt32` packed branch — line/character offset arrays.

The dual-track lesson `scip-protobuf-hand-rolled-reader.md` captured the *why* of the hand-rolled reader. This one captures the *how* of writing it correctly.

## Linked

- [[scip-protobuf-hand-rolled-reader]] — the original "why hand-roll" decision.
- PR #138 — the four-site fix.
1 change: 0 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
<!-- Intentionally synchronized with CLAUDE.md. Edit both files together.
v1 docs sweep: AGENTS.md drops session-local spec coordinates that
CLAUDE.md still carries. The substantive guidance is identical. -->

## OpenCodeHub MCP Tools

This repository has been indexed by OpenCodeHub. When you are working in this
Expand Down
Loading