Skip to content

feat(web):Add web page to Markdown conversion tool#1048

Open
wondywang wants to merge 1 commit intojackwener:mainfrom
wondywang:html-to-markdown
Open

feat(web):Add web page to Markdown conversion tool#1048
wondywang wants to merge 1 commit intojackwener:mainfrom
wondywang:html-to-markdown

Conversation

@wondywang
Copy link
Copy Markdown

This file implements a CLI tool to convert web pages to Markdown format, utilizing the Readability library for content extraction and Turndown for HTML-to-Markdown conversion. It includes options for outputting to a file or stdout and handles various media URLs.

Description

This PR introduces a new, dedicated CLI plugin for converting web pages into high-fidelity Markdown. It is designed to address the limitations of the existing web read command, which currently lacks robust support for rich content like embedded media and complex tables.

This plugin ensures a more complete conversion by strictly preserving the original URLs of images and videos and accurately rendering HTML tables into Markdown format, making it suitable for archiving or processing content-heavy web pages.

Motivation

The primary motivation for this feature is to overcome the shortcomings of the current web read functionality. While web read is useful for extracting plain text, it often fails to capture the full context of a webpage, specifically:

  • Multimedia Content: Images and videos are frequently omitted or their links are lost.

  • Tabular Data: HTML tables are not correctly converted, leading to a loss of structured information.

This plugin fills that gap by providing a specialized tool for users who need to preserve the integrity of rich media and structured data when converting HTML to Markdown.

Web MD — Convert any web page to Markdown with enhanced quality.

Uses @mozilla/readability for content extraction and Turndown + GFM
or HTML-to-Markdown conversion. Preserves image/video URLs.

Usage:

# 1. Convert a webpage and save to a specific Markdown file
opencli web md --url "https://example.com/article" --output ./docs/article.md

# 2. Convert a webpage and print the result directly to the terminal (stdout)
opencli web md --url "https://example.com/article" --stdout

# 3. Basic conversion (default behavior may depend on core config)
opencli web md --url "https://example.com/article"

Related issue: None

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🌐 New site adapter
  • 📝 Documentation
  • ♻️ Refactor
  • 🔧 CI / build / tooling

This file implements a CLI tool to convert web pages to Markdown format, utilizing the Readability library for content extraction and Turndown for HTML-to-Markdown conversion. It includes options for outputting to a file or stdout and handles various media URLs.
@wondywang wondywang changed the title Add web page to Markdown conversion tool feat(web):Add web page to Markdown conversion tool Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant