Skip to content

fix: Preserve Japanese characters in sanitize_title#19

Merged
Cassin01 merged 1 commit into
mainfrom
ttt
Jan 20, 2026
Merged

fix: Preserve Japanese characters in sanitize_title#19
Cassin01 merged 1 commit into
mainfrom
ttt

Conversation

@Cassin01
Copy link
Copy Markdown
Owner

Change pattern from whitelist (ASCII-only) to blacklist approach. The previous pattern [^%w%-%_]+ only kept ASCII alphanumeric, causing Japanese characters to be replaced with dashes and resulting in empty strings for Japanese-only titles.

New pattern explicitly lists unsafe filename characters to replace, preserving Japanese and other Unicode characters.

Summary

Brief description of the changes in this PR.

Related Issue

Fixes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist

  • My code follows the project's code style (Fennel, not Lua directly)
  • I have run make to regenerate Lua files
  • I have run make test and all tests pass
  • I have added tests for new functionality
  • I have updated the documentation (if needed)

Testing

Describe how you tested your changes:

Screenshots (if applicable)

Add screenshots to help explain your changes.

Change pattern from whitelist (ASCII-only) to blacklist approach.
The previous pattern [^%w%-%_]+ only kept ASCII alphanumeric,
causing Japanese characters to be replaced with dashes and
resulting in empty strings for Japanese-only titles.

New pattern explicitly lists unsafe filename characters to replace,
preserving Japanese and other Unicode characters.
Copilot AI review requested due to automatic review settings January 20, 2026 14:32
@Cassin01 Cassin01 merged commit 1a7c3b3 into main Jan 20, 2026
3 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where Japanese characters in memo titles were being replaced with dashes, resulting in empty filenames. The fix changes the sanitize_title function from a whitelist approach (keeping only ASCII alphanumeric characters) to a blacklist approach (removing only specific unsafe filesystem characters).

Changes:

  • Modified the pattern in sanitize_title from [^%w%-%_]+ to [%s%c!\"#$%%&'()*+,./:;<=>?@%[\\%]^{|}~]+` to preserve Unicode characters
  • Added three test cases covering Japanese character preservation, mixed Japanese/ASCII titles, and full-width punctuation

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
fnl/sm/memo.fnl Updated sanitize_title pattern to use blacklist approach for preserving Unicode
lua/sm/memo.lua Auto-generated Lua file from Fennel source with updated pattern
fnl/sm/memo_test.fnl Added three test cases for Japanese character handling
lua/sm/memo_test.lua Auto-generated Lua test file with new test cases (UTF-8 as byte sequences)

Comment thread fnl/sm/memo_test.fnl
Comment on lines +43 to +45
(assert (= (M._sanitize_title "日本語") "日本語") "sanitize: preserve Japanese")
(assert (= (M._sanitize_title "日本語 Test") "日本語-test") "sanitize: Japanese with ASCII")
(assert (= (M._sanitize_title "記号!テスト") "記号!テスト") "sanitize: preserve full-width punctuation")
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding test cases for edge scenarios to ensure robust handling:

  • Title with only punctuation (e.g., "!!!" or "...") which would result in an empty string after sanitization
  • Title with mixed full-width and half-width characters
  • Title with leading/trailing spaces combined with Unicode characters

This would help verify that the new blacklist approach handles all edge cases correctly, especially the empty string case which could result in malformed filenames like "20260117_143052_.md".

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants