Commit c2ecf29
committed
Preserve images within figure elements
This CL fixes an issue where images nested within `<div>` elements
inside a `<figure>` tag were being incorrectly removed by Readability's
cleaning process. Specifically, a structure like
`<figure><div><div><img></div></div></figure>` would first be
transformed to `<figure><div><p><img></p></div></figure>`, and then
the outer `<div>` (and its contents) would be erroneously identified
as extraneous and removed.
The fix introduces a targeted exception within _cleanConditionally().
It prevents the removal of `<div>` elements that meet the following
criteria:
* The element is a `<div>`.
* The `<div>` is an ancestor of a `<figure>` element.
* The `<div>` contains a single `<img>` element (potentially nested).
Also add test case allrecipes-1 taken from:
https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-118320101 parent d7949dc commit c2ecf29
4 files changed
Lines changed: 3655 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2506 | 2506 | | |
2507 | 2507 | | |
2508 | 2508 | | |
| 2509 | + | |
| 2510 | + | |
| 2511 | + | |
| 2512 | + | |
| 2513 | + | |
2509 | 2514 | | |
2510 | 2515 | | |
2511 | 2516 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
0 commit comments