feat: Zig-native multipart/form-data parser for file upload parity#126
feat: Zig-native multipart/form-data parser for file upload parity#126
Conversation
- Add zig/src/multipart.zig: RFC 2046 multipart parser + RFC 7578
form-data handling + application/x-www-form-urlencoded parser
- Zero-copy part splitting, boundary extraction, Content-Disposition
parsing for name/filename/content-type
- Wire into server.zig callPythonHandler: detect Content-Type header,
parse multipart or urlencoded bodies, pass form_fields dict and
file_fields list (with raw bytes) to Python kwargs
- File fields passed as [{name, filename, content_type, body}, ...]
so Python can populate UploadFile instances with byte-identical data
Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
…es= support - Add Form()/File()/UploadFile parameter resolution in both sync and async enhanced_handler (step 3.5 between headers and JSON body) - Skip JSON body parsing when form_fields or file_fields are present - Populate UploadFile from Zig-parsed file parts: write raw bytes to SpooledTemporaryFile, set filename/content_type/size - Support Form() with alias, File() with alias, bare UploadFile type annotation - Add TestClient files= parameter with multipart body encoding - Support mixed files + data fields in multipart requests - Feature-flag pre-check for _has_form_params to avoid overhead on routes that don't use form/file parameters Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
Performance Regression Report
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3198036c84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| parts = [] | ||
| for field_name, file_info in files.items(): | ||
| if isinstance(file_info, tuple): | ||
| filename, file_content = file_info |
There was a problem hiding this comment.
Accept 3-item file tuples in TestClient multipart encoding
files entries are documented/tested as (filename, bytes, content_type), but this code unpacks tuples as exactly two values before checking len(file_info), so any 3-item tuple raises ValueError and the request never gets sent. This breaks the common file-upload path whenever callers provide an explicit content type.
Useful? React with 👍 / 👎.
| if (std.mem.startsWith(u8, req_ct_slice, "multipart/form-data")) { | ||
| if (multipart_mod.extractBoundary(req_ct_slice)) |boundary| { | ||
| if (body.len > 0) { |
There was a problem hiding this comment.
Keep multipart boundary case when extracting from Content-Type
The Content-Type header is fully lowercased and then reused to extract the boundary, but MIME boundary matching against the body is case-sensitive. If the original boundary contains uppercase characters (e.g. ----TurboAPIBoundary...), parseMultipart searches for a lowercased delimiter that does not exist in the body, so valid multipart uploads are not parsed and downstream handlers can fail with 400s.
Useful? React with 👍 / 👎.
| pub fn deinit(self: *const UrlencodedResult, alloc: std.mem.Allocator) void { | ||
| alloc.free(self.fields); | ||
| } |
There was a problem hiding this comment.
Release decoded key/value buffers in UrlencodedResult.deinit
parseUrlencoded allocates decoded name and value slices for each field via percentDecodeAlloc, but UrlencodedResult.deinit only frees the top-level fields array and never frees those per-field allocations. In a long-running server handling application/x-www-form-urlencoded requests, this causes unbounded memory growth.
Useful? React with 👍 / 👎.
Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
Performance Regression Report
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07b5d7bc4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| uf = _UploadFile( | ||
| filename=matched.get("filename"), | ||
| content_type=matched.get( | ||
| "content_type", "application/octet-stream" | ||
| ), | ||
| size=len(matched.get("body", b"")), |
There was a problem hiding this comment.
Preserve bytes semantics for File() parameters
In create_enhanced_handler, the File() branch always constructs an UploadFile and passes that object to the handler, even when the parameter is annotated as bytes (the documented file: bytes = File() pattern). In that case handler code that expects raw bytes (e.g., len(file) or byte operations) will fail or behave incorrectly because it receives an UploadFile instance instead of bytes.
Useful? React with 👍 / 👎.
| const name_marker: []const u8 = "name=\""; | ||
| if (std.mem.indexOf(u8, hvalue, name_marker)) |ni| { | ||
| const start = ni + name_marker.len; | ||
| const end_quote = std.mem.indexOfScalar(u8, hvalue[start..], '"') orelse hvalue.len - start; | ||
| field_name = hvalue[start .. start + end_quote]; |
There was a problem hiding this comment.
Parse Content-Disposition name without matching filename
This name=" lookup can match inside filename="..." when clients emit filename before name in Content-Disposition. In that valid parameter order, field_name is parsed as the filename, so later Python matching by form field name fails and file/form params are treated as missing.
Useful? React with 👍 / 👎.
| const next_delim = std.mem.indexOf(u8, body[part_body_start..], delim) orelse break; | ||
| var part_body = body[part_body_start .. part_body_start + next_delim]; |
There was a problem hiding this comment.
Match multipart boundaries only at delimiter lines
Part splitting uses indexOf(..., delim) over the raw part body, so any embedded --<boundary> byte sequence inside file content is treated as the next boundary even when it is not on a delimiter line. That truncates uploaded files and can corrupt parsing of subsequent parts for otherwise valid multipart payloads.
Useful? React with 👍 / 👎.
…d field strings Multipart format requires \r\n\r\n between headers and body. Tests were missing the extra \r\n, so parseMultipart found 0 fields. Also fix memory leak: UrlencodedResult.deinit now frees individual field name/value strings allocated by percentDecodeAlloc. Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
Performance Regression Report
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 560155f50a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # 4. Parse request body (JSON) — skip if form data was already parsed | ||
| body_data = kwargs.get("body", b"") | ||
| if body_data: | ||
| if body_data and not (_form_fields or _file_fields): |
There was a problem hiding this comment.
Gate JSON-body skip to form/file handlers only
callPythonHandler now injects form_fields for any application/x-www-form-urlencoded or multipart request, but this guard skips JSON parsing whenever those collections are non-empty. For handlers that do not declare Form()/File()/UploadFile params (for example def create(payload: dict)), a form-encoded request now bypasses body parsing entirely, so required args are missing and the enhanced wrapper returns a 500 instead of the previous client-facing parse error path. This skip should be conditioned on _has_form_params (the same pattern is present in both async and sync branches).
Useful? React with 👍 / 👎.
Summary
Implements #125 — Zig-native multipart/form-data parser for file upload parity with FastAPI.
Zig-side (
zig/src/multipart.zig)application/x-www-form-urlencodedparser — key-value splitting, percent-decoding,+→ spaceMultipartResult { fields: []FormField, files: []FileField }Zig integration (
zig/src/server.zig)callPythonHandlernow detectsContent-Type: multipart/form-dataandapplication/x-www-form-urlencodedform_fields: dict[str, str]andfile_fields: list[dict](each with name, filename, content_type, body as raw bytes) to Python kwargsbodyif parsing failsPython-side (
python/turboapi/request_handler.py)Form()/File()/UploadFileparameter resolution in both sync and asyncenhanced_handlerform_fields/file_fieldsfrom Zig kwargs, populatesUploadFilewith byte-identical data (writes to SpooledTemporaryFile, sets filename/content_type/size)_has_form_paramsto avoid overhead on routes that don't use form/file paramsTestClient (
python/turboapi/testclient.py)files=parameter with multipart body encoding{"field": (filename, bytes, content_type)}and{"field": dict}formatsfiles=+data=for combined form fields + file uploadsTests (
tests/test_multipart_file_upload.py)UploadFiletype annotation (noFile()marker needed)Closes #125