[EPUB] Noisy Markdown output (XML/CSS) and missing chapters due to path resolution issues

### Description
Currently, the `_epub_converter.py` in MarkItDown has two significant limitations when handling EPUB files:
- **Noise in output**: XHTML files often include XML declarations and `<style>` blocks. If the `BeautifulSoup` search for `<body>` fails (common with namespaces), the entire raw file content is included in the Markdown output.
- **Missing Content**: The simplistic path joining logic `f"{base_path}/{manifest[item_id]}"` fails to correctly resolve relative paths (e.g., `../Text/...`) used in many commercial EPUB manifests.

### Potential Solution
A more robust manifest parser that handles relative paths and a more aggressive XHTML cleaner would resolve these issues. I have implemented a workaround for a private project and would be happy to contribute a PR if interested.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPUB] Noisy Markdown output (XML/CSS) and missing chapters due to path resolution issues #1724

Description

Potential Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPUB] Noisy Markdown output (XML/CSS) and missing chapters due to path resolution issues #1724

Description

Description

Potential Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions