You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for the remaining canonicalize.ts hardenings identified in the
2026-04-25 cross-project audit. Safe fixes already shipped in 9be870b (BOM
strip, backtick/ZWSP as whitespace, clear/unset/reset-counters/reset-counters-all
added to GENERAL_COMMANDS). This issue tracks the larger items that need design
discussion before landing.
Background
Audit was driven from tikoci/lsp-routeros-ts while evaluating whether to vendor canonicalize.ts for prose-text command extraction (chat input, MCP tool input,
markdown snippets). Full audit doc with methodology, findings, and proposed
hardenings:
50-input probe across 6 categories (prose, comment/quote, bracket/brace,
scripting constructs, identifier-shaped values, markdown). Zero crashes —
the parser is robust as a parser. 12 wrong-but-quiet correctness gaps when
fed anything other than clean CLI input.
Test surface lives at src/canonicalize.fuzz.test.ts — anchor tests document
current behaviour, test.todo markers list the unshipped hardenings below.
Hardening roadmap
Already shipped (commit 9be870b)
✅ H7 — BOM strip + zero-width space as whitespace
✅ Backticks as whitespace (in both outer + word loops)
✅ Universal verb expansion — clear, unset, reset-counters, reset-counters-all added to GENERAL_COMMANDS after DB cross-check
Still on the books
⬜ H1: mode: 'strict' | 'lenient' parameter. Today a leading word
like `Run /ip/address/print` becomes `{path: "/Run/ip/address", verb: "print"}`
because mid-line slash never restarts path context. Lenient mode should
drop leading prose and start a new command on stray slashes. Strict mode
preserves today's behaviour. Anchor test:
`describe('finding Add SafeSkill security badge (73/100 — Passes with Notes) #1 — mid-line slash does not restart path (anchor)')`.
⬜ H2: Tok.Var for \$identifier. Variables are tolerated in args
position (most common case works) but treated as path segments when they
appear after a path-shaped run with no recognised verb. Adding a dedicated
token type makes them never-a-segment.
⬜ H3: paren (…) expression scope. `:if ($x = 1) do={ /log/info "yes" }`
produces ZERO commands today — parens aren't tokenized; the entire
expression is consumed as garbage tokens that swallow the do-block.
Inconsistently, `:while ($i < 10) do={ /log/info $i }` extracts the
inner command. Recommend tokenizing `(` and `)` and skipping their
contents (or recursing for any nested `[…]` subshells).
⬜ H4: pluggable isVerb resolver.The most impactful one — and
it specifically benefits rosetta. The expanded universal verb set
closed the easy half of finding Export routeros-docs-links.json as a CI artifact for LSP consumers #4, but `info`/`warning`/`error`/`debug`
can't go in the universal set because they're path-context-dependent
(`info` is a verb at `/log`, a dir at `/interface/wireless`).
Proposed API:
```ts
export interface CanonicalizeOptions {
cwd?: string;
mode?: 'strict' | 'lenient';
/** Optional path-context-aware verb classifier. Called when the parser
encounters a token that could be either a path segment or a verb.
Return true to treat it as a verb. Falls back to GENERAL_COMMANDS /
EXTRA_VERBS when not provided. MUST be synchronous. */
isVerb?: (token: string, parentPath: string) => boolean;
}
```
Wiring:
rosetta (TUI / MCP / DB-backed callers): pass a callback that does
`SELECT 1 FROM commands WHERE name=? AND parent_path=? AND type='cmd'`
— sub-millisecond per call, version-aware, path-aware.
lsp-routeros-ts: pass a callback backed by a static `verbs.json`
(see Export routeros-docs-links.json as a CI artifact for LSP consumers #4 for the parallel docs-links artifact pattern), augmented at
runtime by classifications observed in `/console/inspect highlight`
responses.
standalone / no-deps callers: omit the option, get today's universal
set behaviour.
This keeps the module pure and dependency-free while letting each consumer
pick its precision/cost tradeoff. Most of `EXTRA_VERBS` could be
retired once rosetta's TUI/MCP wires the DB resolver — the data already
has every cmd verb at every menu.
⬜ H5: { after key= is a literal block value.
`/system/script/add name=foo source={ /ip/address/print }` extracts the
inner `/ip/address/print` and drops the outer add command because
`{...}` is treated as a scope to recurse into rather than a quoted
block-value. May need the H4 resolver to know that `add` itself doesn't
introduce a block.
⬜ H6: extractMentions(input) for navigation-only references.
`extractPaths('/ip/firewall/filter')` returns `[]` because no verb is
attached. For "what does this text reference?" use cases (rosetta's
classifier, LSP hover, MCP context-feeders) we want a separate function
that surfaces every path the text mentions, with or without a verb.
⬜ H8: confidence flag on each CanonicalCommand.
`'high' | 'medium' | 'low'` — well-formed CLI / relative-with-cwd /
prose-extracted respectively. Lets consumers filter
(LSP could ignore `'low'` for hover, accept all for "what's this
doing?" queries).
Why this matters across tikoci
Robust extraction of RouterOS commands from arbitrary strings is
high-value across the org:
lsp-routeros-ts — hover, document-link provider, prose-aware features
tikbook — RouterOS in markdown fences and notebook cells
rosetta — classifier (`src/classify.ts`) overlaps with this; the TUI
and MCP both feed user input through canonicalize before DB lookup
Future Copilot / agent tooling — any feature that takes free-form
text and needs to find the RouterOS commands inside
Today most of these treat the parser as black-box. With the H4 resolver
landed, each surface gets the same parser with a backend appropriate to
its constraints.
How to pick this up
Read this issue + the audit doc linked at the top.
Open `src/canonicalize.fuzz.test.ts` — the `test.todo` markers map 1:1
to H1–H8 here. Promoting one to `test(...)` forces the implementation.
Suggested order: H4 first (largest payoff, unblocks rosetta itself),
then H1 (lenient mode, unblocks LSP), then H5, then the smaller ones.
Each landing should:
flip the corresponding `test.todo` to `test(...)` with concrete
assertions
update the corresponding "anchor" test in the same file (the one
documenting today's wrong behaviour) to assert the new correct
behaviour
add a CHANGELOG bullet under `[Unreleased]` → `Fixed` or `Added`
cross-check verb additions / API changes against the `commands`
table to avoid path collisions
Tracking issue for the remaining
canonicalize.tshardenings identified in the2026-04-25 cross-project audit. Safe fixes already shipped in 9be870b (BOM
strip, backtick/ZWSP as whitespace,
clear/unset/reset-counters/reset-counters-alladded to
GENERAL_COMMANDS). This issue tracks the larger items that need designdiscussion before landing.
Background
Audit was driven from
tikoci/lsp-routeros-tswhile evaluating whether to vendorcanonicalize.tsfor prose-text command extraction (chat input, MCP tool input,markdown snippets). Full audit doc with methodology, findings, and proposed
hardenings:
https://github.com/tikoci/lsp-routeros-ts/blob/main/docs/canonicalize-audit.md
50-input probe across 6 categories (prose, comment/quote, bracket/brace,
scripting constructs, identifier-shaped values, markdown). Zero crashes —
the parser is robust as a parser. 12 wrong-but-quiet correctness gaps when
fed anything other than clean CLI input.
Test surface lives at
src/canonicalize.fuzz.test.ts— anchor tests documentcurrent behaviour,
test.todomarkers list the unshipped hardenings below.Hardening roadmap
Already shipped (commit 9be870b)
clear,unset,reset-counters,reset-counters-alladded toGENERAL_COMMANDSafter DB cross-checkStill on the books
⬜ H1:
mode: 'strict' | 'lenient'parameter. Today a leading wordlike `Run /ip/address/print` becomes `{path: "/Run/ip/address", verb: "print"}`
because mid-line slash never restarts path context. Lenient mode should
drop leading prose and start a new command on stray slashes. Strict mode
preserves today's behaviour. Anchor test:
`describe('finding Add SafeSkill security badge (73/100 — Passes with Notes) #1 — mid-line slash does not restart path (anchor)')`.
⬜ H2:
Tok.Varfor\$identifier. Variables are tolerated in argsposition (most common case works) but treated as path segments when they
appear after a path-shaped run with no recognised verb. Adding a dedicated
token type makes them never-a-segment.
⬜ H3: paren
(…)expression scope. `:if ($x = 1) do={ /log/info "yes" }`produces ZERO commands today — parens aren't tokenized; the entire
expression is consumed as garbage tokens that swallow the do-block.
Inconsistently, `:while ($i < 10) do={ /log/info $i }` extracts the
inner command. Recommend tokenizing `(` and `)` and skipping their
contents (or recursing for any nested `[…]` subshells).
⬜ H4: pluggable
isVerbresolver. The most impactful one — andit specifically benefits rosetta. The expanded universal verb set
closed the easy half of finding Export routeros-docs-links.json as a CI artifact for LSP consumers #4, but `info`/`warning`/`error`/`debug`
can't go in the universal set because they're path-context-dependent
(`info` is a verb at `/log`, a dir at `/interface/wireless`).
Proposed API:
```ts
export interface CanonicalizeOptions {
cwd?: string;
mode?: 'strict' | 'lenient';
/** Optional path-context-aware verb classifier. Called when the parser
isVerb?: (token: string, parentPath: string) => boolean;
}
```
Wiring:
`SELECT 1 FROM commands WHERE name=? AND parent_path=? AND type='cmd'`
— sub-millisecond per call, version-aware, path-aware.
(see Export routeros-docs-links.json as a CI artifact for LSP consumers #4 for the parallel docs-links artifact pattern), augmented at
runtime by classifications observed in `/console/inspect highlight`
responses.
set behaviour.
This keeps the module pure and dependency-free while letting each consumer
pick its precision/cost tradeoff. Most of `EXTRA_VERBS` could be
retired once rosetta's TUI/MCP wires the DB resolver — the data already
has every cmd verb at every menu.
Pairs with: a CI-published `verbs.json` artifact (analogous to the
`routeros-docs-links.json` ask in Export routeros-docs-links.json as a CI artifact for LSP consumers #4) for callers that don't want to
bundle the full DB.
⬜ H5:
{afterkey=is a literal block value.`/system/script/add name=foo source={ /ip/address/print }` extracts the
inner `/ip/address/print` and drops the outer add command because
`{...}` is treated as a scope to recurse into rather than a quoted
block-value. May need the H4 resolver to know that `add` itself doesn't
introduce a block.
⬜ H6:
extractMentions(input)for navigation-only references.`extractPaths('/ip/firewall/filter')` returns `[]` because no verb is
attached. For "what does this text reference?" use cases (rosetta's
classifier, LSP hover, MCP context-feeders) we want a separate function
that surfaces every path the text mentions, with or without a verb.
⬜ H8: confidence flag on each
CanonicalCommand.`'high' | 'medium' | 'low'` — well-formed CLI / relative-with-cwd /
prose-extracted respectively. Lets consumers filter
(LSP could ignore `'low'` for hover, accept all for "what's this
doing?" queries).
Why this matters across tikoci
Robust extraction of RouterOS commands from arbitrary strings is
high-value across the org:
and MCP both feed user input through canonicalize before DB lookup
text and needs to find the RouterOS commands inside
Today most of these treat the parser as black-box. With the H4 resolver
landed, each surface gets the same parser with a backend appropriate to
its constraints.
How to pick this up
to H1–H8 here. Promoting one to `test(...)` forces the implementation.
then H1 (lenient mode, unblocks LSP), then H5, then the smaller ones.
assertions
documenting today's wrong behaviour) to assert the new correct
behaviour
table to avoid path collisions
Cross-references