Skip to content

feat: Zig-native multipart/form-data parser for file upload parity#126

Merged
justrach merged 5 commits intomainfrom
feature/125-zig-multipart-parser
Apr 11, 2026
Merged

feat: Zig-native multipart/form-data parser for file upload parity#126
justrach merged 5 commits intomainfrom
feature/125-zig-multipart-parser

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Implements #125 — Zig-native multipart/form-data parser for file upload parity with FastAPI.

Zig-side (zig/src/multipart.zig)

  • RFC 2046 multipart parser — boundary extraction, MIME part splitting, Content-Disposition parsing for name/filename/content-type
  • application/x-www-form-urlencoded parser — key-value splitting, percent-decoding, + → space
  • Zero-copy part slicing on the body bytes (only urlencoded values need alloc for percent-decoding)
  • Structured result: MultipartResult { fields: []FormField, files: []FileField }

Zig integration (zig/src/server.zig)

  • callPythonHandler now detects Content-Type: multipart/form-data and application/x-www-form-urlencoded
  • Parses body in Zig, passes form_fields: dict[str, str] and file_fields: list[dict] (each with name, filename, content_type, body as raw bytes) to Python kwargs
  • Falls back to raw body if parsing fails

Python-side (python/turboapi/request_handler.py)

  • Form() / File() / UploadFile parameter resolution in both sync and async enhanced_handler
  • Step 3.5: reads form_fields/file_fields from Zig kwargs, populates UploadFile with byte-identical data (writes to SpooledTemporaryFile, sets filename/content_type/size)
  • Skips JSON body parsing when form data is present
  • Feature-flagged with _has_form_params to avoid overhead on routes that don't use form/file params

TestClient (python/turboapi/testclient.py)

  • New files= parameter with multipart body encoding
  • Supports {"field": (filename, bytes, content_type)} and {"field": dict} formats
  • Mixed files= + data= for combined form fields + file uploads

Tests (tests/test_multipart_file_upload.py)

  • Form field parsing: urlencoded, alias, defaults
  • File upload: basic, byte-identical round-trip, mixed form+file, multiple files
  • Binary content (null bytes, PNG header), large files (100KB), empty files
  • Bare UploadFile type annotation (no File() marker needed)
  • URL-encoded special characters and percent encoding

Closes #125

justrach and others added 3 commits April 11, 2026 20:23
- Add zig/src/multipart.zig: RFC 2046 multipart parser + RFC 7578
  form-data handling + application/x-www-form-urlencoded parser
- Zero-copy part splitting, boundary extraction, Content-Disposition
  parsing for name/filename/content-type
- Wire into server.zig callPythonHandler: detect Content-Type header,
  parse multipart or urlencoded bodies, pass form_fields dict and
  file_fields list (with raw bytes) to Python kwargs
- File fields passed as [{name, filename, content_type, body}, ...]
  so Python can populate UploadFile instances with byte-identical data

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
…es= support

- Add Form()/File()/UploadFile parameter resolution in both sync
  and async enhanced_handler (step 3.5 between headers and JSON body)
- Skip JSON body parsing when form_fields or file_fields are present
- Populate UploadFile from Zig-parsed file parts: write raw bytes
  to SpooledTemporaryFile, set filename/content_type/size
- Support Form() with alias, File() with alias, bare UploadFile
  type annotation
- Add TestClient files= parameter with multipart body encoding
- Support mixed files + data fields in multipart requests
- Feature-flag pre-check for _has_form_params to avoid overhead
  on routes that don't use form/file parameters

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Performance Regression Report

Endpoint req/s avg latency p99 latency threshold status
GET /health 99,268 0.20ms 1.13ms 80,000 OK
GET / 97,984 0.19ms 0.95ms 80,000 OK
GET /json 94,781 0.19ms 0.89ms 75,000 OK
GET /users/123 92,528 0.19ms 0.72ms 75,000 OK
POST /items 57,222 6.56ms 187.37ms 45,000 OK
GET /status201 94,494 0.20ms 0.61ms 65,000 OK
AVERAGE 89,379 50,000 OK

✅ All endpoints pass regression thresholds

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3198036c84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

parts = []
for field_name, file_info in files.items():
if isinstance(file_info, tuple):
filename, file_content = file_info
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept 3-item file tuples in TestClient multipart encoding

files entries are documented/tested as (filename, bytes, content_type), but this code unpacks tuples as exactly two values before checking len(file_info), so any 3-item tuple raises ValueError and the request never gets sent. This breaks the common file-upload path whenever callers provide an explicit content type.

Useful? React with 👍 / 👎.

Comment on lines +1820 to +1822
if (std.mem.startsWith(u8, req_ct_slice, "multipart/form-data")) {
if (multipart_mod.extractBoundary(req_ct_slice)) |boundary| {
if (body.len > 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep multipart boundary case when extracting from Content-Type

The Content-Type header is fully lowercased and then reused to extract the boundary, but MIME boundary matching against the body is case-sensitive. If the original boundary contains uppercase characters (e.g. ----TurboAPIBoundary...), parseMultipart searches for a lowercased delimiter that does not exist in the body, so valid multipart uploads are not parsed and downstream handlers can fail with 400s.

Useful? React with 👍 / 👎.

Comment on lines +28 to +30
pub fn deinit(self: *const UrlencodedResult, alloc: std.mem.Allocator) void {
alloc.free(self.fields);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Release decoded key/value buffers in UrlencodedResult.deinit

parseUrlencoded allocates decoded name and value slices for each field via percentDecodeAlloc, but UrlencodedResult.deinit only frees the top-level fields array and never frees those per-field allocations. In a long-running server handling application/x-www-form-urlencoded requests, this causes unbounded memory growth.

Useful? React with 👍 / 👎.

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Performance Regression Report

Endpoint req/s avg latency p99 latency threshold status
GET /health 100,954 0.19ms 0.90ms 80,000 OK
GET / 100,998 0.17ms 0.49ms 80,000 OK
GET /json 90,758 0.19ms 0.65ms 75,000 OK
GET /users/123 95,884 0.21ms 0.66ms 75,000 OK
POST /items 59,146 7.11ms 184.02ms 45,000 OK
GET /status201 89,558 0.20ms 0.85ms 65,000 OK
AVERAGE 89,550 50,000 OK

✅ All endpoints pass regression thresholds

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07b5d7bc4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +727 to +732
uf = _UploadFile(
filename=matched.get("filename"),
content_type=matched.get(
"content_type", "application/octet-stream"
),
size=len(matched.get("body", b"")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve bytes semantics for File() parameters

In create_enhanced_handler, the File() branch always constructs an UploadFile and passes that object to the handler, even when the parameter is annotated as bytes (the documented file: bytes = File() pattern). In that case handler code that expects raw bytes (e.g., len(file) or byte operations) will fail or behave incorrectly because it receives an UploadFile instance instead of bytes.

Useful? React with 👍 / 👎.

Comment on lines +120 to +124
const name_marker: []const u8 = "name=\"";
if (std.mem.indexOf(u8, hvalue, name_marker)) |ni| {
const start = ni + name_marker.len;
const end_quote = std.mem.indexOfScalar(u8, hvalue[start..], '"') orelse hvalue.len - start;
field_name = hvalue[start .. start + end_quote];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse Content-Disposition name without matching filename

This name=" lookup can match inside filename="..." when clients emit filename before name in Content-Disposition. In that valid parameter order, field_name is parsed as the filename, so later Python matching by form field name fails and file/form params are treated as missing.

Useful? React with 👍 / 👎.

Comment on lines +94 to +95
const next_delim = std.mem.indexOf(u8, body[part_body_start..], delim) orelse break;
var part_body = body[part_body_start .. part_body_start + next_delim];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match multipart boundaries only at delimiter lines

Part splitting uses indexOf(..., delim) over the raw part body, so any embedded --<boundary> byte sequence inside file content is treated as the next boundary even when it is not on a delimiter line. That truncates uploaded files and can corrupt parsing of subsequent parts for otherwise valid multipart payloads.

Useful? React with 👍 / 👎.

…d field strings

Multipart format requires \r\n\r\n between headers and body.
Tests were missing the extra \r\n, so parseMultipart found 0 fields.
Also fix memory leak: UrlencodedResult.deinit now frees individual
field name/value strings allocated by percentDecodeAlloc.

Co-authored-by: trilokagent <275208033+trilokagent@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Performance Regression Report

Endpoint req/s avg latency p99 latency threshold status
GET /health 87,860 0.22ms 1.06ms 80,000 OK
GET / 89,537 0.23ms 0.63ms 80,000 OK
GET /json 84,336 0.25ms 0.67ms 75,000 OK
GET /users/123 83,818 0.24ms 0.73ms 75,000 OK
POST /items 45,415 4.57ms 110.07ms 45,000 OK
GET /status201 77,300 0.23ms 0.66ms 65,000 OK
AVERAGE 78,045 50,000 OK

✅ All endpoints pass regression thresholds

@justrach justrach merged commit 1c80c67 into main Apr 11, 2026
5 of 8 checks passed
@justrach justrach deleted the feature/125-zig-multipart-parser branch April 11, 2026 15:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 560155f50a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# 4. Parse request body (JSON) — skip if form data was already parsed
body_data = kwargs.get("body", b"")
if body_data:
if body_data and not (_form_fields or _file_fields):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate JSON-body skip to form/file handlers only

callPythonHandler now injects form_fields for any application/x-www-form-urlencoded or multipart request, but this guard skips JSON parsing whenever those collections are non-empty. For handlers that do not declare Form()/File()/UploadFile params (for example def create(payload: dict)), a form-encoded request now bypasses body parsing entirely, so required args are missing and the enhanced wrapper returns a 500 instead of the previous client-facing parse error path. This skip should be conditioned on _has_form_params (the same pattern is present in both async and sync branches).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Zig-native multipart/form-data parser for file upload parity

1 participant