Skip to content

List-function abbreviations missing from grammar-surface (DISC-001 / BL-014) #3

@rafael5

Description

@rafael5

Type: downstream feedback / data-completeness
Surfaced by: tree-sitter-m DISC-001 during VS Code extension test-routine authoring
Build-log entry: BL-014
Schema impact: none (additive to all_forms; schema_version stays "1")

Summary

integrated/grammar-surface.json lists every entry in the YDB list-manipulation function family with abbreviation="" and all_forms=["$LISTBUILD"] — the canonical full form only. The standard 2-letter abbreviations that both YDB and IRIS implementations accept are not in the data.

Affected canonicals (all 12)

Canonical Standard 2-letter abbrev
$LIST (no abbrev — already 2 chars off $LI, but $LIST itself is the short form)
$LISTBUILD $LB
$LISTDATA $LD
$LISTFIND $LF
$LISTFROMSTRING $LFS
$LISTGET $LG (or $LIST overload) — verify against MLAB001.html
$LISTLENGTH $LL
$LISTNEXT $LN
$LISTSAME $LS
$LISTTOSTRING $LTS
$LISTUPDATE $LU
$LISTVALID $LV

(The exact short forms should be verified against the YDB documentation — the canonical YDB reference is MLAB001.html and IRIS's ObjectScript reference also publishes them.)

Evidence

import json
d = json.load(open("integrated/grammar-surface.json"))
for it in d["intrinsic_functions"]:
    if it["canonical"].startswith("$LIST"):
        print(it["canonical"], it["abbreviation"], it["all_forms"])

prints

$LIST          ['$LIST']
$LISTBUILD     ['$LISTBUILD']
$LISTDATA      ['$LISTDATA']
... (every one canonical-only)

Likely root cause

The YDB intrinsic-function extractor (m_standard/extractors/ydb_functions.py or similar) reads the section header but doesn't pick up the abbreviation table that follows it inside each function's documentation block. Same crawler-only-reads-the-header bug may also affect the $Z* function families ($ZBITAND/$ZB, $ZCONVERT/$ZC, etc.) — worth a spot-check.

Impact downstream

tree-sitter-m parses by walking the union of all_forms from this file. With abbreviations missing, $LB(1,2,3) produces an ERROR node; $LISTBUILD(1,2,3) parses cleanly. Real YDB-style code uses abbreviations frequently (the YDB docs themselves do), so any tool that parses real YDB code via tree-sitter-m miscolours / mis-analyses those calls. VistA itself uses canonicals so the VistA smoke gate doesn't surface this.

Proposed fix plan

  1. Audit m_standard/extractors/ydb_functions.py — confirm whether the abbreviation table is being read at all.
  2. Re-extract; confirm integrated/grammar-surface.json lists $LB, $LL, etc. in all_forms.
  3. Spot-check the $Z* function families for the same bug.
  4. Cut a release; tree-sitter-m's regen pulls the abbreviations through automatically per tree-sitter-m AD-04.

Acceptance

After the fix, this should pass:

node -e '
const { Parser, Language } = require("web-tree-sitter");
(async () => {
  await Parser.init();
  const lang = await Language.load("./tree-sitter-m.wasm");
  const p = new Parser(); p.setLanguage(lang);
  for (const fn of ["\$LB(1,2,3)", "\$LL(\"x\")", "\$LI(\$LB(1,2,3),2)"]) {
    const t = p.parse("T\n W " + fn + "\n Q\n");
    console.log(t.rootNode.hasError ? "FAIL " + fn : "ok   " + fn);
  }
})()'
# expected:
# ok   $LB(1,2,3)
# ok   $LL("x")
# ok   $LI($LB(1,2,3),2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions