Skip to content

fix: handle UTF-16 LE/BE encoded files with BOM#929

Open
lawrence3699 wants to merge 1 commit intocharmbracelet:masterfrom
lawrence3699:fix/utf16-file-rendering
Open

fix: handle UTF-16 LE/BE encoded files with BOM#929
lawrence3699 wants to merge 1 commit intocharmbracelet:masterfrom
lawrence3699:fix/utf16-file-rendering

Conversation

@lawrence3699
Copy link
Copy Markdown

Fixes #513

Problem

UTF-16 LE/BE encoded markdown files (common when files are saved by Windows editors or converted from other formats) render as mostly blank content because the raw UTF-16 bytes are passed directly to the markdown renderer.

Fix

Detect UTF-16 byte order marks and convert file content to UTF-8 before rendering. The conversion is applied in all three file-reading paths:

  • CLI mode (executeCLI)
  • TUI file view (ui.Init)
  • Stash file loading (loadLocalMarkdown)

A ToUTF8 utility function handles BOM detection and conversion:

  • UTF-16 LE BOM (FF FE): decoded via golang.org/x/text/encoding/unicode
  • UTF-16 BE BOM (FE FF): decoded via golang.org/x/text/encoding/unicode
  • UTF-8 BOM (EF BB BF): stripped
  • No BOM / plain UTF-8: passed through unchanged
  • Decode failure: falls back to original bytes

golang.org/x/text is already a dependency (v0.32.0), so no new dependencies are introduced.

Before

$ glow utf16le-readme.md
(mostly blank output)

After

$ glow utf16le-readme.md

  # Hello World

  This is a test.

Validation

  • go test ./... — all pass (7 new tests for ToUTF8)
  • go vet ./... — clean
  • Manual test with UTF-16 LE and BE files with BOM

Detect UTF-16 LE/BE byte order marks and convert file content to UTF-8
before rendering. Previously, UTF-16 encoded markdown files appeared
mostly blank because the raw UTF-16 bytes were passed directly to the
markdown renderer.

The conversion uses golang.org/x/text (already a dependency) and falls
back to the original bytes when no BOM is detected or if decoding fails.
UTF-8 BOMs are also stripped.

Fixes charmbracelet#513
Copilot AI review requested due to automatic review settings April 13, 2026 13:55
@lawrence3699 lawrence3699 requested a review from a team as a code owner April 13, 2026 13:55
@lawrence3699 lawrence3699 requested review from andreynering and meowgorithm and removed request for a team April 13, 2026 13:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes rendering of UTF-16 LE/BE markdown files (with BOM) by converting file bytes to UTF-8 before passing content to the markdown renderer, addressing issue #513.

Changes:

  • Add utils.ToUTF8 to detect BOMs (UTF-8/UTF-16 LE/BE) and normalize content to UTF-8.
  • Apply UTF-8 normalization across CLI, TUI file view, and stash local file loading paths.
  • Add unit tests covering UTF-8 passthrough/BOM stripping and UTF-16 LE/BE decoding.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
utils/utils.go Introduces ToUTF8 BOM detection + UTF-16→UTF-8 decoding utility.
utils/utils_test.go Adds test coverage for UTF-8/UTF-16 BOM handling and edge cases (nil/empty, multibyte chars).
ui/ui.go Normalizes on-disk file bytes via utils.ToUTF8 before frontmatter stripping/rendering.
ui/stash.go Normalizes locally loaded stash file bytes via utils.ToUTF8 before storing into md.Body.
main.go Normalizes CLI input bytes via utils.ToUTF8 prior to frontmatter stripping/rendering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problems parsing UTF16 files

2 participants