Conversation
Change pattern from whitelist (ASCII-only) to blacklist approach. The previous pattern [^%w%-%_]+ only kept ASCII alphanumeric, causing Japanese characters to be replaced with dashes and resulting in empty strings for Japanese-only titles. New pattern explicitly lists unsafe filename characters to replace, preserving Japanese and other Unicode characters.
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where Japanese characters in memo titles were being replaced with dashes, resulting in empty filenames. The fix changes the sanitize_title function from a whitelist approach (keeping only ASCII alphanumeric characters) to a blacklist approach (removing only specific unsafe filesystem characters).
Changes:
- Modified the pattern in
sanitize_titlefrom[^%w%-%_]+to[%s%c!\"#$%%&'()*+,./:;<=>?@%[\\%]^{|}~]+` to preserve Unicode characters - Added three test cases covering Japanese character preservation, mixed Japanese/ASCII titles, and full-width punctuation
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fnl/sm/memo.fnl | Updated sanitize_title pattern to use blacklist approach for preserving Unicode |
| lua/sm/memo.lua | Auto-generated Lua file from Fennel source with updated pattern |
| fnl/sm/memo_test.fnl | Added three test cases for Japanese character handling |
| lua/sm/memo_test.lua | Auto-generated Lua test file with new test cases (UTF-8 as byte sequences) |
| (assert (= (M._sanitize_title "日本語") "日本語") "sanitize: preserve Japanese") | ||
| (assert (= (M._sanitize_title "日本語 Test") "日本語-test") "sanitize: Japanese with ASCII") | ||
| (assert (= (M._sanitize_title "記号!テスト") "記号!テスト") "sanitize: preserve full-width punctuation") |
There was a problem hiding this comment.
Consider adding test cases for edge scenarios to ensure robust handling:
- Title with only punctuation (e.g., "!!!" or "...") which would result in an empty string after sanitization
- Title with mixed full-width and half-width characters
- Title with leading/trailing spaces combined with Unicode characters
This would help verify that the new blacklist approach handles all edge cases correctly, especially the empty string case which could result in malformed filenames like "20260117_143052_.md".
Change pattern from whitelist (ASCII-only) to blacklist approach. The previous pattern [^%w%-%_]+ only kept ASCII alphanumeric, causing Japanese characters to be replaced with dashes and resulting in empty strings for Japanese-only titles.
New pattern explicitly lists unsafe filename characters to replace, preserving Japanese and other Unicode characters.
Summary
Brief description of the changes in this PR.
Related Issue
Fixes #
Type of Change
Checklist
maketo regenerate Lua filesmake testand all tests passTesting
Describe how you tested your changes:
Screenshots (if applicable)
Add screenshots to help explain your changes.