Skip to content

Fix crash on file upload with non-ASCII filename#1352

Open
lawrence3699 wants to merge 1 commit intosissbruecker:masterfrom
lawrence3699:fix/non-ascii-filename-crash
Open

Fix crash on file upload with non-ASCII filename#1352
lawrence3699 wants to merge 1 commit intosissbruecker:masterfrom
lawrence3699:fix/non-ascii-filename-crash

Conversation

@lawrence3699
Copy link
Copy Markdown

Fixes #1348

Problem

Uploading a file with non-ASCII characters in its name (e.g. explodér.html) crashes with UnicodeEncodeError when gzip.open() tries to create the compressed file on systems where the filesystem encoding is ASCII — the default in many Docker containers.

The root cause is in _generate_asset_filename's sanitize_char: it uses char.isalnum() which returns True for accented letters like é ('é'.isalnum() → True), so they pass through into the filesystem path. When Python then tries to open that path on an ASCII-encoded filesystem, it fails:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 73

Fix

Add char.isascii() as a guard so only ASCII alphanumeric characters (plus -, _, .) survive into the filename. Non-ASCII characters are replaced with _, same as spaces and other special characters already are. The original filename is still preserved in display_name for the UI.

Tests

Added a regression test that uploads a file named explodér.html and verifies:

  • the asset file is created successfully (no crash)
  • the sanitized filename replaces é with _
  • the file content round-trips correctly through gzip
  • display_name still shows the original filename

All 24 tests in test_assets_service.py pass.

Restrict sanitize_char in _generate_asset_filename to ASCII-only
characters. str.isalnum() returns True for accented letters like é,
which then cause UnicodeEncodeError in gzip.open() on systems with
ASCII filesystem encoding (common in Docker containers).

Adding str.isascii() as a guard ensures only portable ASCII characters
reach the filesystem while display_name still preserves the original
filename for the UI.

Fixes sissbruecker#1348
Copilot AI review requested due to automatic review settings April 10, 2026 01:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash during asset uploads when the uploaded filename contains non-ASCII characters by tightening filename sanitization and adding a regression test to validate the behavior.

Changes:

  • Restrict _generate_asset_filename to only allow ASCII alphanumerics (plus -, _, .) in the generated on-disk filename.
  • Add a regression test for uploading explodér.html, asserting the stored filename is sanitized while display_name preserves the original name.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
bookmarks/services/assets.py Updates filename sanitization to prevent non-ASCII characters from being written into filesystem paths.
bookmarks/tests/test_assets_service.py Adds a regression test covering upload behavior with a non-ASCII filename.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 243 to 250
def _generate_asset_filename(
asset: BookmarkAsset, filename: str, extension: str
) -> str:
def sanitize_char(char):
if char.isalnum() or char in ("-", "_", "."):
if char.isascii() and (char.isalnum() or char in ("-", "_", ".")):
return char
else:
return "_"
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_generate_asset_filename now sanitizes non-ASCII characters in the base filename, but the extension argument is still interpolated verbatim into the returned path. For uploaded files, extension comes from os.path.splitext(upload_file.name) and can itself contain non-ASCII characters (e.g. foo.é -> extension é / é.gz), which would still reintroduce the same UnicodeEncodeError on ASCII-encoded filesystems. Consider applying the same ASCII-only sanitization to extension (or normalizing/sanitizing the final composed filename) so all path segments are filesystem-safe.

Copilot uses AI. Check for mistakes.
Comment on lines +410 to +417
def test_upload_asset_with_non_ascii_filename(self):
bookmark = self.setup_bookmark()
file_content = b"test content"
upload_file = SimpleUploadedFile(
"explodér.html", file_content, content_type="text/html"
)

asset = assets.upload_asset(bookmark, upload_file)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other upload_asset tests are decorated with @disable_logging, but this new test is not. Since upload_asset logs an info message on success, this can add noise to test output and is inconsistent with the existing test suite conventions. Add @disable_logging above this test as well.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] File uploader crashes when file name has non-ASCII characters (like é)

2 participants