PR into master from dev/olga/Add-typst-fotmat-for-math-v1#405
Draft
OlgaRedozubova wants to merge 29 commits intomasterfrom
Draft
PR into master from dev/olga/Add-typst-fotmat-for-math-v1#405OlgaRedozubova wants to merge 29 commits intomasterfrom
OlgaRedozubova wants to merge 29 commits intomasterfrom
Conversation
39b65d8 to
d81d7f4
Compare
Implement SerializedTypstVisitor that converts MathJax's internal MathML
tree into native Typst math syntax, enabling direct LaTeX → Typst and
MathML → Typst conversion.
Key features:
- Full token handling: mi, mo, mn, mtext, mspace with font variants,
operator detection, and context-aware spacing
- Script/limit constructs: msub, msup, msubsup, munder, mover,
munderover, mmultiscripts with movablelimits-aware placement
- Structural elements: mfrac, msqrt, mroot, mtable (matrix/cases/
equation arrays with alignment), mfenced, menclose, mphantom
- Delimiter handling: paired/unpaired bracket detection via pre-serialization
tree walk, lr() wrapping, abs/norm/floor/ceil shorthand with separator
fallback to lr()
- Equation numbering: auto-numbered and \tag{} equations, numcases/
subnumcases grid layout with per-row counters and labels
- Symbol mapping: 500+ Unicode → Typst symbol mappings including arrows,
relations, accents, Greek, operators, geometry, and suits
- Escape handling: unified scanner for comma/semicolon/colon escaping
in function calls, string literal skipping, bracket depth tracking
- Dual output: block (typst) and inline (typst_inline) variants
- Context menu integration for copying Typst math output
Architecture:
- Modular handler files: token-handlers, script-handlers,
structural-handlers, table-handlers
- Shared utilities: common.ts, consts.ts, types.ts, escape-utils.ts,
bracket-utils.ts, typst-symbol-map.ts
- Strict TypeScript typing throughout, no any casts
Extract serializeThousandSepChain into common.ts, replacing duplicated chain logic in index.ts and structural-handlers.ts. Add tree mutation order comment, forLatex/include_typst comments, and remove test logging.
…content spacing
- isDerivativePattern now checks for actual prime chars (′ ″ ‴) instead of any mo
- Move SCRIPT_NODE_KINDS and PRIME_CHARS to consts.ts, remove duplicate SCRIPT_PARENT_KINDS
- Revert post-content loop to needsTokenSeparator (fixes tau_(i,j)(t) regression)
- Add comment about prevNode in \left...\right delimiter handling
- Add test for f^{(n)}(a) → f^((n))(a) (TeXAtom derivative, no space)
- \left.\right\} now produces lr(mat(delim: #none, ...) \}) instead of losing the brace - Mismatched pairs like \left[\right) use lr() wrapping instead of mat(delim:) - Matched pairs (same char or standard open→close) still use compact mat(delim: ...) - Add tests for \left.\right\} in align* and mismatched \left[\right) on array
Two MathJax patterns for \not: - Overlay: TeXAtom(REL) > mpadded[width=0] > mtext(⧸) — detected in mrow/inferredMrow loops, next sibling wrapped in cancel() - Combining char: U+0338 appended to mi/mo text — stripped and wrapped in cancel() in token handlers - Fix cancel() loss on early returns in mo handler (multiword/namedOp) - Add tests for \not 7,60 and \not k + \not q
d81d7f4 to
f673fb3
Compare
…t-handling API - Add two-pass escapeUnpairedBrackets (reuses scanBracketTokens + findUnpairedIndices) - Integrate into escapeContentSeparators for all function-call arguments - Integrate replaceUnpairedBrackets into escapeCasesSeparators for consistent API - Remove manual replaceUnpairedBrackets calls from table-handlers - Add unit tests for escape-utils
… lr() The mrow handler incorrectly delegated to mtable when ANY child was a table, even when other content (arrows, operators) sat alongside it. Now hasTableChild is true only when mtable is the sole content child. The mtable handler also checks parent delimiters only for sole-content case, preventing double lr() wrapping. Extract getContentChildren into common.ts and containsTable as a standalone helper to eliminate duplication between the two handlers.
…call parsing In Typst math mode, identifier( is parsed as a function call (see typst/typst#7274). Insert a space before ( when the preceding token is a multi-char name not in TYPST_BUILTIN_OPS (e.g. emptyset, sigma, Gamma, psi). Single-char identifiers (f, g) and built-in operators (sin, cos, ln, arg) keep no space.
…rsing Add escapeColon to escapeContentSeparators so word: inside any Typst function call becomes word : (space prevents named-arg syntax). Apply escapeContentSeparators to abs(), norm(), floor(), ceil() content which previously had no escaping.
…content
- Extract resolveDelimiterMo helper to access texClass on delimiter nodes
- Reject ‖ pairing when opener has CLOSE texClass (surrounding pair context)
- Reject ‖ pairing when content contains PUNCT (comma between standalone ‖)
- Reject ‖ pairing spanning entire row when content has REL operator (=)
- Apply escapeContentSeparators to bare delimiter func-call content (norm,
floor, ceil) to prevent commas/semicolons/colons breaking Typst parsing
- Add explicit isFuncCall flag to BARE_DELIM_PAIRS instead of endsWith('(')
- Add 4 test cases: standalone ‖, complex ‖ with comma/number/variable
…ith ; separators Typst ignores \\ linebreaks inside mat() cells. When aligned/gathered environments are nested inside a matrix or cases cell, convert them to mat(delim: #none, ...) using ; row separators instead. - Add isInsideMatrixCell() recursive parent-chain walker - Wrap nested mat() in display() for block output to reset scriptlevel - Propagate typst_inline (without display()) through cell/row pipeline - Determine alignment from column usage: gathered→center, rl-pairs→right/left - Extract buildMatExpr helper to deduplicate block/inline mat construction
…gh lr() - Wrap cases() and plain matrices in display() when inside a mat() cell to prevent Typst scriptlevel reduction (block only, not inline) - Route eqnArrays with rowlines/columnlines through mat() format to preserve augment: #(hline/vline); add stroke: (dash: "dashed") when all separator lines are dashed - Propagate typst_inline through structural-handlers lr() path by building parallel contentInline and extracting buildLrExpr() helper - Extract computeAugment() and buildEqnArrayAsMat() helpers to deduplicate augment computation and eqnArray-as-mat construction - Detect eqnArray-with-lines parents in isInsideMatrixCell() - Cache isInsideMatrixCell() result to avoid redundant parent walks - Use separate needsSpaceBetweenNodes() calls for block/inline content
…lose Brackets inside these nodes are now paired independently from brackets outside, preventing false pairing when content is split across Typst function-call arguments (e.g. \sqrt( arg ) where ( and ) end up in different scopes). Each child of a scope-boundary node is processed as a separate pairing scope. SCOPE_BOUNDARIES set is module-level.
…mrows
- Detect \left.\aligned\right\} as cases(reverse: #true, ...) for
eqnArray-like tables (displaystyle rows); regular arrays keep matrix form
- Add hasTableFirst in structural-handlers: \left\{ table extra \right.
lets the table inherit { as cases(), extra content follows outside
- Add isFirstWithInvisibleClose in table-handlers so the table picks up
the open delimiter from the parent mrow when close is invisible
- Track contentInline in the hasTableChild/hasTableFirst mrow branch so
typst_inline propagates correctly when children return differing inline
- Add tests for reverse cases and cases() + stretch() patterns
- Digits before ( (.4() are no longer treated as function calls —
only ASCII letters qualify (isFuncCallParen)
- When a supposed function-call ( has no matching ), backtrack so the
for-loop re-scans the range and picks up any [, ], {, } inside
- Use non-whitespace check for spacing around symbol names (paren.l,
bracket.r, etc.) instead of \w — fixes missing space after quoted
strings ("л"paren.l) and other non-\w tokens
- Extend RE_WORD_CHAR, RE_WORD_DOT_END, RE_WORD_START with \p{L} for
Unicode letter support
- Move RE_ASCII_LETTER, RE_TRAILING_WS, RE_LEADING_WS to consts.ts
- Add tests: unpaired brackets across matrix rows with digits, letters,
real functions, and inner brackets inside failed function-call scans
escapeLrSemicolons now also escapes colons after identifiers (g: → g :), matching the behavior already present in escapeCasesSeparators and escapeContentSeparators. Without this, lr(g: K_0 ]) would be parsed by Typst as a named argument. Add tests for colon escaping in lr(), abs(), and general lr() paths.
MathJax splits \mathrm{टेक} into individual mi nodes per character,
breaking Devanagari/Arabic combining sequences. serializeCombiningMiChain
merges consecutive non-Latin mi nodes with the same mathvariant into a
single font-wrapped quoted string. Known math symbols (∂, ψ, ∅) are
excluded via typstSymbolMap lookup. Uses Unicode script properties
(\p{Script=Latin}) for robust Latin vs non-Latin classification.
… bases
Remove overline/underline from RE_SPECIAL_FN_CALL — they do not imply
below/above placement like overbrace/underbrace do. Add overbracket/
underbracket which were missing. Now \underset{...}{\underline{x}}
correctly produces limits(underline(x))_(...).
- escapeLrBrackets: escapes bare bracket chars matching the lr() delimiter type so Typst doesn't auto-scale inner brackets (e.g. \left[ [...] \right] → lr([ \[...\] ])). Only same-type brackets are escaped. - isSyntaxParen: renamed from isFuncCallParen, now also skips _() and ^() script grouping parens in scanBracketTokens. - Fix RE_SPECIAL_FN_CALL: remove overline/underline (they don't imply below/above placement), add overbracket/underbracket.
Wrap #box(stroke:...) and #circle(inset:...) with #align(center, ...)
for block display so they center like LaTeX \boxed and \enclose{circle}.
Inline variant remains unwrapped. Add integral.surf (\oiint), slash.o
(\oslash), lt.approx (\lessapprox), gt.approx (\gtrapprox) to symbol map.
Rewrite escapeUnbalancedParens to use scanBracketTokens + findUnpairedIndices instead of single-pass scanExpression — handles both unbalanced ( and ) (previously only )). Add mover/munder to SCOPE_BOUNDARIES so brackets inside accents don't pair with brackets outside. Remove dead escapeUnbalancedCloseParen option from scanExpression. Fixes \overline(x), \underline(x), \hat(x) producing unescaped parens.
Replace overline(")"content) / underline(")"content) with
overline(lr(\) content)) / underline(lr(\) content)) so the )
delimiter auto-scales via lr() instead of rendering at fixed size.
- \xcancel → cancel(cross: #true, ...) when both diagonal strikes present
- Script children (sub/sup) of msub/msup/msubsup are now separate scopes
in markUnpairedBrackets, while base stays in parent scope — fixes
\cancelto{5(y}x) where ( in script paired with ) outside
- safeFormatScript wrapper in script-handlers applies escapeUnbalancedParens
to ^(…)/_(…) content; removes escape-utils import from common.ts
\underset and \overset create munder/mover without accentunder/accent
attributes — they must use the general limits() path, not accent handlers.
Previously \underset{\rightarrow}{r} produced attach(r, b: arrow.r)
instead of limits(r)_(arrow.r).
MathJax builds \longrightleftharpoons and \longleftrightarrows from
mover with harpoon/arrow pieces. Detect these patterns via
CONSTRUCTED_LONG_ARROWS map and emit single Typst symbols
(harpoons.rtlb, arrows.lr).
Flatten mover(munder(...), over) via unwrapToScriptNode so
\stackrel{k_1}{\underset{k_2}{...}} produces limits(base)_(k_2)^(k_1)
instead of nested limits(limits(base)_(k_2))^(k_1).
… notation matching
- menclose with border-side notation (left/right/top/bottom combos from
\begin{array}{|l|}\hline) now generates #box(stroke: (...)) with per-side
strokes instead of overline()/underline()
- Cap vline augment indices at actual column count to prevent out-of-bounds
when column spec has more columns than data cells
- Refactor menclose notation checks from String.includes() to Set-based
word-boundary safe matching via parseNotation()/hasNotation()
…airs serializeRange used needsTokenSeparator which lacks the script+bracket spacing check. Switched to needsSpaceBetweenNodes so that e.g. \|L_N^n(\Delta S)\|_\infty produces norm(L_N^n (Delta S)) with a space before ( to prevent Typst from parsing n( as a function call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add native Typst math output format (LaTeX/MathML → Typst)
Implement SerializedTypstVisitor that converts MathJax's internal MathML tree into native Typst math syntax, enabling direct LaTeX → Typst and MathML → Typst conversion.
Key features:
Architecture: