Skip to content

Add a newline-sensitive lexer mode independent of indent#14

Merged
johnsoncodehk merged 1 commit into
masterfrom
newline-mode
Jun 6, 2026
Merged

Add a newline-sensitive lexer mode independent of indent#14
johnsoncodehk merged 1 commit into
masterfrom
newline-mode

Conversation

@johnsoncodehk
Copy link
Copy Markdown
Owner

Closes #10.

Problem

Some DSLs are newline-aware but not indent-aware: statements are line-delimited, but nesting is via delimiters/expressions rather than indentation (e.g. dotenv-style env specs). The only way to get significant newlines was to opt into indent, which also brings INDENT/DEDENT stack management and YAML block-scalar semantics.

Approach

newline is the line-boundary + flow-suspension layer that indent already builds on (indent = newline + indent stack + YAML semantics), so instead of a parallel mode this factors out the shared base. A single lineSensitive gate drives the shared lexer machinery; the line-start routine forks — an indent grammar measures columns and runs the stack, a newline-only grammar emits one NEWLINE token per significant boundary. Declaring both is rejected.

The parser needs no change: INDENT/DEDENT/NEWLINE are ordinary tokens a grammar references as many(Newline, Stmt), with no balanced-pair or stack assumptions. In tree-sitter the NEWLINE token is a stateless external (modeled on the raw_text scanner); extras stays /\s/ because tree-sitter runs the external scanner before skipping extras, so a significant newline is taken at statement boundaries while a flow-internal newline falls back to extras as whitespace.

Changes

  • src/types.tsNewlineConfig + CstGrammar.newline
  • src/api.ts — passthrough + indent/newline mutual exclusion
  • src/gen-lexer.ts — the layered emission
  • src/emit-parser.ts — bake the newline config
  • src/gen-treesitter.ts — NEWLINE stateless external scanner
  • test/newline-mode.ts — inline env-spec grammar across all four backends + a real tree-sitter generate/parse

Verification

  • All seven existing grammars regenerate byte-identically (npm run gen)
  • TS parser conformance unchanged vs baseline (5386/5659)
  • gate:treesitter 96.0% (>= floor, still beats official)
  • agnostic 9/9, sanity 15/15, newline-mode 29/29 (incl. a real tree-sitter parse showing flow-internal newlines are suppressed)

Some DSLs are newline-aware but not indent-aware: statements are
line-delimited, but nesting is via delimiters/expressions rather than
indentation (e.g. dotenv-style env specs). Until now the only way to get
significant newlines was to opt into `indent`, which also drags in
INDENT/DEDENT stack management and YAML block-scalar semantics.

This adds a `newline` mode as the line-boundary + flow-suspension layer
that `indent` already builds on (indent = newline + indent stack + YAML
semantics). A single `lineSensitive` gate shares the lexer machinery; the
line-start routine forks so a newline-only grammar emits one NEWLINE token
per significant boundary with no indent stack. The two modes are mutually
exclusive.

- types.ts: NewlineConfig + CstGrammar.newline
- api.ts: passthrough + indent/newline mutual-exclusion check
- gen-lexer.ts: the layered emission
- emit-parser.ts: bake the newline config for the standalone parser
- gen-treesitter.ts: NEWLINE as a stateless external scanner token
- test/newline-mode.ts: an inline env-spec grammar exercising all four
  backends plus a real tree-sitter generate + parse

All existing grammars regenerate byte-identically; TS parser conformance,
the agnostic gate, and the tree-sitter accuracy gate are unchanged.
@johnsoncodehk johnsoncodehk merged commit 2e664dd into master Jun 6, 2026
2 checks passed
@johnsoncodehk johnsoncodehk deleted the newline-mode branch June 6, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add newline-sensitive mode independent from indent/dedent

1 participant