Skip to content

feat(md): track sourcepos in raw HTML links#615

Draft
caugner wants to merge 15 commits intomainfrom
fix-flaws-in-html-links
Draft

feat(md): track sourcepos in raw HTML links#615
caugner wants to merge 15 commits intomainfrom
fix-flaws-in-html-links

Conversation

@caugner
Copy link
Copy Markdown
Contributor

@caugner caugner commented Apr 8, 2026

Description

Update Markdown to HTML conversion, adding tracking of sourcepos for raw HTML links.

Motivation

Allow fixing flaws in HTML links automatically, currently responsible for about 4,500 issues.

Additional details

Related issues and pull requests

Part of mdn/fred#1462.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

791fb97 was deployed to: https://rari-pr615.review.mdn.allizom.net/

caugner added 4 commits April 15, 2026 13:50
Covers:
- Uppercase `<A>` in block path.
- Bare `<a` at end of input (tests `unwrap_or fallback`).
- Malformed unclosed tag (tests `find_opening_tag_end fallback`).
…notation gap

Documents two issues found in code review:
- inject_sourcepos_in_html_block uses byte offsets for columns, diverging
  from Comrak's character-based columns when multi-byte UTF-8 precedes <a>
- inject_sourcepos_in_opening_a only annotates the first <a> in a string,
  leaving subsequent tags unannotated
find_opening_tag_end and inject_sourcepos_in_html_block were computing
1-based columns as byte offsets (lt - line_start + 1), which diverges
from Comrak's character-based column convention when multi-byte UTF-8
characters appear before an <a> tag on the same line.

Switch both column computations to str::chars().count() and update
find_opening_tag_end's signature from &[u8] to &str accordingly.
Previously the function injected data-sourcepos only into the first <a>
tag it found, silently leaving any subsequent tags in the same string
unannotated. Loop over all matches so every opening <a> receives the
attribute.

In normal flow each HtmlInline node contains exactly one tag (Comrak's
invariant), so this is a defensive correctness improvement rather than
a live bug fix.
caugner added 7 commits April 15, 2026 14:30
`.iter().enumerate().skip(n)` iterates and discards the first `n`
elements (O(n)), so scanning multiple tags in a large HTML block was
O(n²). Slicing first — `bytes[pos..lt]` / `bytes[tag_start..]` — makes
each iteration start immediately at the right offset.
Passing a `bool` into a function only to branch on it immediately is
unidiomatic. Move the `sourcepos` check to the `iter_nodes` closure so
`annotate_raw_html_links` has a single, unconditional responsibility.
The `else { return }` branch in `Action::Block` handling can never fire:
`Action::Block` is only produced after matching `NodeValue::HtmlBlock`,
so the second borrow will always find the same variant. Using
`unreachable!()` documents the invariant and will surface any future
regression loudly instead of silently doing nothing.
`format!(" data-sourcepos=\"{sp}\"")` allocated a temporary `String`
only to immediately push it into `result`. Write the fixed prefix and
suffix with `push_str`/`push` directly, keeping a single `format!` for
the numeric coordinates which still need heap formatting.
…_a_annotates_all`

The previous comment said the test "will fail until the function
annotates all `<a>` tags", but the `while` loop in
`inject_sourcepos_in_opening_a` already annotates every tag. Replace
the incorrect description with an accurate account of the function's
contract and the typical single-tag Comrak usage.
The `while pos < bytes.len()` bound was redundant: the `?` on
`.position()` already propagates `None` when no `<` remains. Replacing
the `if lt + 1 >= bytes.len() { break }` guard with a `?` on
`bytes.get(lt + 1)` removes the dead `None` at the end and makes both
exit conditions uniform.
The compiler infers `usize` from context; the explicit suffix is noise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants