Add a newline-sensitive lexer mode independent of indent#14
Merged
Conversation
Some DSLs are newline-aware but not indent-aware: statements are line-delimited, but nesting is via delimiters/expressions rather than indentation (e.g. dotenv-style env specs). Until now the only way to get significant newlines was to opt into `indent`, which also drags in INDENT/DEDENT stack management and YAML block-scalar semantics. This adds a `newline` mode as the line-boundary + flow-suspension layer that `indent` already builds on (indent = newline + indent stack + YAML semantics). A single `lineSensitive` gate shares the lexer machinery; the line-start routine forks so a newline-only grammar emits one NEWLINE token per significant boundary with no indent stack. The two modes are mutually exclusive. - types.ts: NewlineConfig + CstGrammar.newline - api.ts: passthrough + indent/newline mutual-exclusion check - gen-lexer.ts: the layered emission - emit-parser.ts: bake the newline config for the standalone parser - gen-treesitter.ts: NEWLINE as a stateless external scanner token - test/newline-mode.ts: an inline env-spec grammar exercising all four backends plus a real tree-sitter generate + parse All existing grammars regenerate byte-identically; TS parser conformance, the agnostic gate, and the tree-sitter accuracy gate are unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #10.
Problem
Some DSLs are newline-aware but not indent-aware: statements are line-delimited, but nesting is via delimiters/expressions rather than indentation (e.g. dotenv-style env specs). The only way to get significant newlines was to opt into
indent, which also brings INDENT/DEDENT stack management and YAML block-scalar semantics.Approach
newlineis the line-boundary + flow-suspension layer thatindentalready builds on (indent = newline + indent stack + YAML semantics), so instead of a parallel mode this factors out the shared base. A singlelineSensitivegate drives the shared lexer machinery; the line-start routine forks — an indent grammar measures columns and runs the stack, a newline-only grammar emits one NEWLINE token per significant boundary. Declaring both is rejected.The parser needs no change: INDENT/DEDENT/NEWLINE are ordinary tokens a grammar references as
many(Newline, Stmt), with no balanced-pair or stack assumptions. In tree-sitter the NEWLINE token is a stateless external (modeled on the raw_text scanner);extrasstays/\s/because tree-sitter runs the external scanner before skipping extras, so a significant newline is taken at statement boundaries while a flow-internal newline falls back to extras as whitespace.Changes
src/types.ts—NewlineConfig+CstGrammar.newlinesrc/api.ts— passthrough + indent/newline mutual exclusionsrc/gen-lexer.ts— the layered emissionsrc/emit-parser.ts— bake the newline configsrc/gen-treesitter.ts— NEWLINE stateless external scannertest/newline-mode.ts— inline env-spec grammar across all four backends + a real tree-sitter generate/parseVerification
npm run gen)gate:treesitter96.0% (>= floor, still beats official)