Parse markdown found within HTML blocks#128
Conversation
This changeset updates the parsing behavior to pass along the content between opening and closing tags of a LOWDOWN_BLOCKHTML segment for additional markdown parsing. The strict tag matching behavior has been removed, as it lacked the context needed to operate correctly when there may be nested tags (such as nested divs). Consider a content segment such as: ``` <div class="container"> <div class="container column-1"> And some text </div> <div class="container column-2"> With more text </div> </div> ``` With these changes, an outer LOWDOWN_BLOCKHTML node will be the parent of all the content within the "container" div. The closing tag for each block is included as the final child as a LOWDOWN_RAW_HTML node.
| return skip; | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
The definition for html_find_block was moved up to allow visibility in html_find_end.
| * Returns the length on match, 0 otherwise. | ||
| */ | ||
| static size_t | ||
| html_find_end_strict(const char *tag, size_t tag_len, |
There was a problem hiding this comment.
Removed html_find_end_strict as the way it was coded did not allow enough context to know about nested HTML blocks that may have the same tag type (such as nested divs).
|
|
||
| result = hbuf_putc(ob, '\n'); | ||
|
|
||
| if (!hbuf_putb(ob, content)) |
There was a problem hiding this comment.
Print out the inner content. Without this change, any child nodes of LOWDOWN_BLOCKHTML are passed over.
kristapsdz
left a comment
There was a problem hiding this comment.
Thank you for this! I'll look at it over the next week or so. As this parsing is standard behaviour for pandoc, it should probably be enabled by default. Moving forward:
- First, make sure that this flies with all regression tests (
make regress).- This will probably require adding an option to disable the behaviour, as it conflicts with the original Markdown format, which some tests depend upon. I can help with this.
- If regression tests for standard invocation fail by depending on interior bits not being parsed, these tests should be fixed or removed.
- Second, add regression tests specifically for this behaviour.
- Third, run the behaviour exhaustively through AFL to find any corner bugs. I can do this with access to bigger machines.
|
Thank you for your feedback. The |
|
I have not given up on this PR. I apologize for the lengthy delay, but still hope to get the behavior working without breaking existing functionality. |
|
@ron-at-swgy @kristapsdz Is this still of interest? Would like to complete it? |
|
I’ve been working with a fork. I wasn’t able to get the tests updated etc. feel free to take over the work. |
|
Ok #182 |
This changeset updates the parsing behavior to pass along the content between opening and closing tags of a
LOWDOWN_BLOCKHTMLsegment for additional markdown parsing. The strict tag matching behavior has been removed, as it lacked the context needed to operate correctly when there may be nested tags (such as nested divs).Consider a content segment such as:
With these changes, an outer
LOWDOWN_BLOCKHTMLnode will be the parent of all the content within the "container" div. The closing tag for each block is included as the final child as aLOWDOWN_RAW_HTMLnode.