Skip to content

Page detection issue #58

@bbbco

Description

@bbbco

Describe the bug
There is an issue with the extra logic around the isAPage boolean that is really unnecessary inside the getFilePath() function. The getPageFilePath() function that is called inside that function does a great job of figuring out whether the URL should become an html` page or just be left alone with it's original extension.

Otherwise, you can run into issues like I did while scraping a Squarespace website: There is a reference to the images CDN that gets pulled down as a file without the .html extension. When other images are being downloaded and saved, it's attempting to save them inside the directory with the same name as the images CDN file that now already exists. These images cannot be saved because there is an already existing file with this name (instead of an existing directory).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions