Commit 1f575fd
committed
feat(ie-html): implement core HTML tokenizer state machine #54
WHATWG HTML tokenizer with ~40 core states:
- Token types: Doctype (with public/system IDs, force_quirks),
StartTag (with attributes, self_closing), EndTag, Character,
Comment, Eof
- States: Data, TagOpen, EndTagOpen, TagName, all attribute states
(before/name/after/value with double/single/unquoted quoting),
SelfClosingStartTag, BogusComment, MarkupDeclarationOpen,
all comment states, all doctype states, CDataSection
- Iterator-based: impl Iterator<Item = Token>
- Tag names lowercased per spec
- Pending token queue for multi-token emissions
- set_state() for tree builder feedback
- Parse errors logged via tracing, never abort
- Placeholder stubs for script/raw text/entity states (Step 1b)
- 17 unit tests1 parent f303e7b commit 1f575fd
3 files changed
Lines changed: 1262 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
| 11 | + | |
9 | 12 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
0 commit comments