Skip to content

Conversation

@sp2935
Copy link

@sp2935 sp2935 commented Jan 20, 2026

Summary

This PR fixes issue #155 where whitespace-only content inside inline formatting tags (like <strong> </strong>) was being stripped entirely, causing words to concatenate incorrectly.

Before: further<strong> </strong>referencefurtherreference
After: further<strong> </strong>referencefurther reference

Problem

The chomp() function strips leading/trailing whitespace from text inside inline tags to prevent incorrect markdown like ** foo**. However, when the tag contains only whitespace, chomp() would return ('', '', '') (with prefix stripped and empty text), which then gets converted to an empty string by abstract_inline_conversion().

This causes text like word1<b> </b>word2 to become word1word2 instead of word1 word2.

Solution

  1. Modified chomp() to detect whitespace-only text and return ('', '', ' ') - preserving a single space as the text content.

  2. Modified abstract_inline_conversion() to check if the chomped text is whitespace-only (text.isspace()) and if so, return just the whitespace without wrapping it in markdown markers.

Changes

  • markdownify/__init__.py: Updated chomp() and abstract_inline_conversion()
  • tests/test_advanced.py: Updated test_chomp expectations to reflect new behavior
  • tests/test_conversions.py: Added test_whitespace_only_inline_tags() for regression testing

Test Plan

  • All existing tests pass (84 tests)
  • New test test_whitespace_only_inline_tags verifies the fix
  • Verified with real-world DOCX files where this issue occurs

🤖 Generated with Claude Code

)

When an inline formatting tag (strong, b, em, i, etc.) contains only
whitespace, the content is now preserved as a single space instead of
being stripped entirely. This fixes issue matthewwithanm#155 where text like
`further<strong> </strong>reference` was incorrectly converted to
`furtherreference` instead of `further reference`.

Changes:
- Modified chomp() to return (' ', '', ' ') for whitespace-only text
- Modified abstract_inline_conversion() to skip markup for whitespace-only text
- Updated test_chomp to reflect new expected behavior
- Added test_whitespace_only_inline_tags for regression testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant