feat: Zig-native multipart/form-data parser for file upload parity by justrach · Pull Request #126 · justrach/turboAPI

justrach · 2026-04-11T12:27:42Z

Summary

Implements #125 — Zig-native multipart/form-data parser for file upload parity with FastAPI.

Zig-side (`zig/src/multipart.zig`)

RFC 2046 multipart parser — boundary extraction, MIME part splitting, Content-Disposition parsing for name/filename/content-type
application/x-www-form-urlencoded parser — key-value splitting, percent-decoding, + → space
Zero-copy part slicing on the body bytes (only urlencoded values need alloc for percent-decoding)
Structured result: MultipartResult { fields: []FormField, files: []FileField }

Zig integration (`zig/src/server.zig`)

callPythonHandler now detects Content-Type: multipart/form-data and application/x-www-form-urlencoded
Parses body in Zig, passes form_fields: dict[str, str] and file_fields: list[dict] (each with name, filename, content_type, body as raw bytes) to Python kwargs
Falls back to raw body if parsing fails

Python-side (`python/turboapi/request_handler.py`)

Form() / File() / UploadFile parameter resolution in both sync and async enhanced_handler
Step 3.5: reads form_fields/file_fields from Zig kwargs, populates UploadFile with byte-identical data (writes to SpooledTemporaryFile, sets filename/content_type/size)
Skips JSON body parsing when form data is present
Feature-flagged with _has_form_params to avoid overhead on routes that don't use form/file params

TestClient (`python/turboapi/testclient.py`)

New files= parameter with multipart body encoding
Supports {"field": (filename, bytes, content_type)} and {"field": dict} formats
Mixed files= + data= for combined form fields + file uploads

Tests (`tests/test_multipart_file_upload.py`)

Form field parsing: urlencoded, alias, defaults
File upload: basic, byte-identical round-trip, mixed form+file, multiple files
Binary content (null bytes, PNG header), large files (100KB), empty files
Bare UploadFile type annotation (no File() marker needed)
URL-encoded special characters and percent encoding

Closes #125

- Add zig/src/multipart.zig: RFC 2046 multipart parser + RFC 7578 form-data handling + application/x-www-form-urlencoded parser - Zero-copy part splitting, boundary extraction, Content-Disposition parsing for name/filename/content-type - Wire into server.zig callPythonHandler: detect Content-Type header, parse multipart or urlencoded bodies, pass form_fields dict and file_fields list (with raw bytes) to Python kwargs - File fields passed as [{name, filename, content_type, body}, ...] so Python can populate UploadFile instances with byte-identical data Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

…es= support - Add Form()/File()/UploadFile parameter resolution in both sync and async enhanced_handler (step 3.5 between headers and JSON body) - Skip JSON body parsing when form_fields or file_fields are present - Populate UploadFile from Zig-parsed file parts: write raw bytes to SpooledTemporaryFile, set filename/content_type/size - Support Form() with alias, File() with alias, bare UploadFile type annotation - Add TestClient files= parameter with multipart body encoding - Support mixed files + data fields in multipart requests - Feature-flag pre-check for _has_form_params to avoid overhead on routes that don't use form/file parameters Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

github-actions · 2026-04-11T12:30:09Z

Performance Regression Report

Endpoint	req/s	avg latency	p99 latency	threshold	status
GET /health	99,268	0.20ms	1.13ms	80,000	OK
GET /	97,984	0.19ms	0.95ms	80,000	OK
GET /json	94,781	0.19ms	0.89ms	75,000	OK
GET /users/123	92,528	0.19ms	0.72ms	75,000	OK
POST /items	57,222	6.56ms	187.37ms	45,000	OK
GET /status201	94,494	0.20ms	0.61ms	65,000	OK
AVERAGE	89,379			50,000	OK

✅ All endpoints pass regression thresholds

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3198036c84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T12:30:46Z

python/turboapi/testclient.py

+            parts = []
+            for field_name, file_info in files.items():
+                if isinstance(file_info, tuple):
+                    filename, file_content = file_info


Accept 3-item file tuples in TestClient multipart encoding

files entries are documented/tested as (filename, bytes, content_type), but this code unpacks tuples as exactly two values before checking len(file_info), so any 3-item tuple raises ValueError and the request never gets sent. This breaks the common file-upload path whenever callers provide an explicit content type.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T12:30:46Z

zig/src/server.zig

+    if (std.mem.startsWith(u8, req_ct_slice, "multipart/form-data")) {
+        if (multipart_mod.extractBoundary(req_ct_slice)) |boundary| {
+            if (body.len > 0) {


Keep multipart boundary case when extracting from Content-Type

The Content-Type header is fully lowercased and then reused to extract the boundary, but MIME boundary matching against the body is case-sensitive. If the original boundary contains uppercase characters (e.g. ----TurboAPIBoundary...), parseMultipart searches for a lowercased delimiter that does not exist in the body, so valid multipart uploads are not parsed and downstream handlers can fail with 400s.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T12:30:46Z

zig/src/multipart.zig

+    pub fn deinit(self: *const UrlencodedResult, alloc: std.mem.Allocator) void {
+        alloc.free(self.fields);
+    }


Release decoded key/value buffers in UrlencodedResult.deinit

parseUrlencoded allocates decoded name and value slices for each field via percentDecodeAlloc, but UrlencodedResult.deinit only frees the top-level fields array and never frees those per-field allocations. In a long-running server handling application/x-www-form-urlencoded requests, this causes unbounded memory growth.

Useful? React with 👍 / 👎.

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

github-actions · 2026-04-11T15:29:24Z

Performance Regression Report

Endpoint	req/s	avg latency	p99 latency	threshold	status
GET /health	100,954	0.19ms	0.90ms	80,000	OK
GET /	100,998	0.17ms	0.49ms	80,000	OK
GET /json	90,758	0.19ms	0.65ms	75,000	OK
GET /users/123	95,884	0.21ms	0.66ms	75,000	OK
POST /items	59,146	7.11ms	184.02ms	45,000	OK
GET /status201	89,558	0.20ms	0.85ms	65,000	OK
AVERAGE	89,550			50,000	OK

✅ All endpoints pass regression thresholds

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07b5d7bc4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T15:32:15Z

python/turboapi/request_handler.py

+                                uf = _UploadFile(
+                                    filename=matched.get("filename"),
+                                    content_type=matched.get(
+                                        "content_type", "application/octet-stream"
+                                    ),
+                                    size=len(matched.get("body", b"")),


Preserve bytes semantics for File() parameters

In create_enhanced_handler, the File() branch always constructs an UploadFile and passes that object to the handler, even when the parameter is annotated as bytes (the documented file: bytes = File() pattern). In that case handler code that expects raw bytes (e.g., len(file) or byte operations) will fail or behave incorrectly because it receives an UploadFile instance instead of bytes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T15:32:15Z

zig/src/multipart.zig

+                const name_marker: []const u8 = "name=\"";
+                if (std.mem.indexOf(u8, hvalue, name_marker)) |ni| {
+                    const start = ni + name_marker.len;
+                    const end_quote = std.mem.indexOfScalar(u8, hvalue[start..], '"') orelse hvalue.len - start;
+                    field_name = hvalue[start .. start + end_quote];


Parse Content-Disposition name without matching filename

This name=" lookup can match inside filename="..." when clients emit filename before name in Content-Disposition. In that valid parameter order, field_name is parsed as the filename, so later Python matching by form field name fails and file/form params are treated as missing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-11T15:32:15Z

zig/src/multipart.zig

+        const next_delim = std.mem.indexOf(u8, body[part_body_start..], delim) orelse break;
+        var part_body = body[part_body_start .. part_body_start + next_delim];


Match multipart boundaries only at delimiter lines

Part splitting uses indexOf(..., delim) over the raw part body, so any embedded --<boundary> byte sequence inside file content is treated as the next boundary even when it is not on a delimiter line. That truncates uploaded files and can corrupt parsing of subsequent parts for otherwise valid multipart payloads.

Useful? React with 👍 / 👎.

…d field strings Multipart format requires \r\n\r\n between headers and body. Tests were missing the extra \r\n, so parseMultipart found 0 fields. Also fix memory leak: UrlencodedResult.deinit now frees individual field name/value strings allocated by percentDecodeAlloc. Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

github-actions · 2026-04-11T15:38:16Z

Performance Regression Report

Endpoint	req/s	avg latency	p99 latency	threshold	status
GET /health	87,860	0.22ms	1.06ms	80,000	OK
GET /	89,537	0.23ms	0.63ms	80,000	OK
GET /json	84,336	0.25ms	0.67ms	75,000	OK
GET /users/123	83,818	0.24ms	0.73ms	75,000	OK
POST /items	45,415	4.57ms	110.07ms	45,000	OK
GET /status201	77,300	0.23ms	0.66ms	65,000	OK
AVERAGE	78,045			50,000	OK

✅ All endpoints pass regression thresholds

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 560155f50a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-11T15:43:11Z

python/turboapi/request_handler.py

+                # 4. Parse request body (JSON) — skip if form data was already parsed
                body_data = kwargs.get("body", b"")
-                if body_data:
+                if body_data and not (_form_fields or _file_fields):


Gate JSON-body skip to form/file handlers only

callPythonHandler now injects form_fields for any application/x-www-form-urlencoded or multipart request, but this guard skips JSON parsing whenever those collections are non-empty. For handlers that do not declare Form()/File()/UploadFile params (for example def create(payload: dict)), a form-encoded request now bypasses body parsing entirely, so required args are missing and the enhanced wrapper returns a 500 instead of the previous client-facing parse error path. This skip should be conditioned on _has_form_params (the same pattern is present in both async and sync branches).

Useful? React with 👍 / 👎.

justrach and others added 3 commits April 11, 2026 20:23

test: add byte-level parity tests for multipart file uploads

3198036

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

chatgpt-codex-connector bot reviewed Apr 11, 2026

View reviewed changes

feat: optimized form_sync/file_sync handler dispatch

07b5d7b

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>

chatgpt-codex-connector bot reviewed Apr 11, 2026

View reviewed changes

justrach merged commit 1c80c67 into main Apr 11, 2026
5 of 8 checks passed

justrach deleted the feature/125-zig-multipart-parser branch April 11, 2026 15:42

chatgpt-codex-connector bot reviewed Apr 11, 2026

View reviewed changes

		const next_delim = std.mem.indexOf(u8, body[part_body_start..], delim) orelse break;
		var part_body = body[part_body_start .. part_body_start + next_delim];

Conversation

justrach commented Apr 11, 2026

Summary

Zig-side (zig/src/multipart.zig)

Zig integration (zig/src/server.zig)

Python-side (python/turboapi/request_handler.py)

TestClient (python/turboapi/testclient.py)

Tests (tests/test_multipart_file_upload.py)

Uh oh!

github-actions bot commented Apr 11, 2026

Performance Regression Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 11, 2026

Performance Regression Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 11, 2026

Performance Regression Report

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Zig-side (`zig/src/multipart.zig`)

Zig integration (`zig/src/server.zig`)

Python-side (`python/turboapi/request_handler.py`)

TestClient (`python/turboapi/testclient.py`)

Tests (`tests/test_multipart_file_upload.py`)