Skip to content
This repository was archived by the owner on Sep 27, 2022. It is now read-only.
This repository was archived by the owner on Sep 27, 2022. It is now read-only.

Ensure clear connection between HTML nodes and plaintext #43

@appledora

Description

@appledora

In GitLab by @geohci on Sep 15, 2022, 16:05

Many use-cases for HTML plaintext do require some knowledge of where each word came from -- e.g., knowing which part of the sentence is a link or was italicized in the HTML can be crucial to training models for link prediction. For the plaintext methods, we should have the ability to see which type of node contributed each character/word while also easily joining them together into a pure string object.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions