Skip to content

Commit c2ecf29

Browse files
committed
Preserve images within figure elements
This CL fixes an issue where images nested within `<div>` elements inside a `<figure>` tag were being incorrectly removed by Readability's cleaning process. Specifically, a structure like `<figure><div><div><img></div></div></figure>` would first be transformed to `<figure><div><p><img></p></div></figure>`, and then the outer `<div>` (and its contents) would be erroneously identified as extraneous and removed. The fix introduces a targeted exception within _cleanConditionally(). It prevents the removal of `<div>` elements that meet the following criteria: * The element is a `<div>`. * The `<div>` is an ancestor of a `<figure>` element. * The `<div>` contains a single `<img>` element (potentially nested). Also add test case allrecipes-1 taken from: https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010
1 parent d7949dc commit c2ecf29

4 files changed

Lines changed: 3655 additions & 0 deletions

File tree

Readability.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2506,6 +2506,11 @@ Readability.prototype = {
25062506
return false;
25072507
}
25082508

2509+
// Handle <img> buried inside nested <div> layers in <figure>.
2510+
if (tag === "div" && this._hasAncestorTag(node, "figure") && this._isSingleImage(node)) {
2511+
return false;
2512+
}
2513+
25092514
var weight = this._getClassWeight(node);
25102515

25112516
this.log("Cleaning Conditionally", node);
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"title": "Hot Honey Brussels Sprouts",
3+
"byline": "Nicole Russell",
4+
"dir": null,
5+
"lang": "en",
6+
"excerpt": "These hot honey Brussels sprouts are a simple side dish with all the elements you'll ever need in a side. They're sweet, spicy, crispy, and melt in your mouth.",
7+
"siteName": "Allrecipes",
8+
"publishedTime": null,
9+
"readerable": true
10+
}

0 commit comments

Comments
 (0)