fix packages/markitdown/src/markitdown/_uri_utils.py

- 
- 
- from urllib.parse import urlparse, unquote_to_bytes
+ from urllib.parse import urlparse, unquote, unquote_to_bytes


-    path = os.path.abspath(url2pathname(parsed.path))
+    decoded_path = unquote(parsed.path)
+    path = os.path.abspath(url2pathname(decoded_path))



Before the change, `file_uri_to_path()` passed `parsed.path` directly into `url2pathname()`.

That usually works for simple ASCII file paths, but it breaks when the file URI contains percent-encoded non-ASCII characters such as Korean filenames. In our case, a URI like:

```text
file:///D:/.../%EC%A0%9C20...hwpx
```

was not being decoded into the original Unicode path before conversion to a Windows filesystem path. As a result, the generated path became invalid and MCP failed to open the file.

The fix was to explicitly call `unquote(parsed.path)` first, so the percent-encoded URI path is restored to its real Unicode form before `url2pathname()` converts it into a local OS path.

In short:
- before: URI path stayed percent-encoded too long
- after: URI path is decoded first, then converted to a Windows path

That makes `file:` URIs with Unicode filenames work correctly in the MCP flow.
- 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix packages/markitdown/src/markitdown/_uri_utils.py #1738

That makes `file:` URIs with Unicode filenames work correctly in the MCP flow.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix packages/markitdown/src/markitdown/_uri_utils.py #1738

Description

That makes file: URIs with Unicode filenames work correctly in the MCP flow.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

That makes `file:` URIs with Unicode filenames work correctly in the MCP flow.