Chunked media uploads to bypass Vercel's 4.5 MB request limit#408
Open
carlosjdelgado wants to merge 17 commits into
Open
Chunked media uploads to bypass Vercel's 4.5 MB request limit#408carlosjdelgado wants to merge 17 commits into
carlosjdelgado wants to merge 17 commits into
Conversation
2d16dab to
9614425
Compare
Vercel serverless functions cap request bodies at 4.5 MB, which after base64 overhead limited media uploads to ~3.3 MB. The browser now slices files into ~3 MB chunks, POSTs each to /api/upload/chunk (multipart), and then calls /api/upload/finalize which reassembles, pushes to GitHub, and deletes the chunks. Chunks are staged in a new upload_chunk table (Postgres text, base64). Stale chunks are reaped opportunistically on each insert. Max file size is 50 MB / 50 chunks, configurable in both endpoints and the client. githubSaveFile is extracted to lib/utils/github-save-file.ts so both the existing files endpoint and the new finalize endpoint share rename-on- conflict logic.
Client now uploads chunks in batches of 4 instead of sequentially. Server moves opportunistic stale-chunk cleanup (chunk endpoint) and per-upload chunk deletion (finalize endpoint) into next/server `after`, so neither blocks the response.
When the file fits in Vercel's 4.5 MB request body (after base64 overhead and JSON envelope), upload directly via the existing files endpoint instead of the chunked path. Cuts DB write/read traffic by the share of small uploads, which is most of them in practice.
The last chunk is sent inline with the finalize metadata (multipart) instead of going through the DB, saving one INSERT+SELECT per upload. For a 4 MB file (2 chunks) this halves DB writes; for larger files the ratio drops but every saved chunk still helps under Neon Free quotas.
Eliminates the +33% base64 overhead in the upload_chunk table. The
INSERT/SELECT bandwidth per chunk drops by ~25% with no client change
and no CPU cost. Migration uses decode('base64') in USING so any chunks
in flight at upgrade time are converted instead of dropped.
For a 20 MB upload, total DB bandwidth goes from ~48 MB to ~36 MB
(2.4x -> 1.8x file size).
Multipart carries the chunk as raw binary, not base64, so 4 MB fits in Vercel's 4.5 MB body with about 500 KB of headroom. For a 20 MB upload this drops the chunk count from 7 to 5 (4 staged in DB, 1 inline) and total DB bandwidth from ~36 MB to ~32 MB.
The first chunk is always CHUNK_BYTES (4 MB) for any multi-chunk upload; the last one can be smaller. Sending the first inline keeps the largest chunk out of the DB. For files whose size is a multiple of CHUNK_BYTES there is no change; for the rest, DB bandwidth drops by up to (CHUNK_BYTES - lastChunkSize) * 2.
Limits per upload to 22 MB of DB bandwidth in the worst case, sized to fit Neon Free quotas with margin for other traffic.
Merges the old 0013 (CREATE TABLE with text data) and 0014 (ALTER to bytea) into a single 0013 that creates the table with bytea directly. Also drops the unused chunk-assembly self-check script.
Files <=3 MB previously skipped the chunked path and posted base64+JSON to the files endpoint. Removed because the chunked path already short-circuits to inline-only when totalChunks=1, no DB rows are created, and the request body is smaller (binary multipart vs base64 JSON). One code path now handles every size up to 15 MB.
ad2b11f to
543c2d1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Vercel caps request bodies at 4.5 MB. Any media upload above that fails outright — including images embedded from the rich-text editor and direct uploads from the media browser.
Solution
Split uploads into chunks ≤ 4 MB, stage them server-side, and reassemble at finalize time. The first chunk rides inline in the finalize request, so files ≤ 4 MB still complete in a single round-trip with zero DB writes.
Wire surface
All media writes — file uploads and folder markers — share one endpoint prefix:
/api/[owner]/[repo]/[branch]/media/[name]/[path].POST/media/[name]/[path]/chunkuploadId,idx,chunkblob).idxmust be in[1, MAX_CHUNK_IDX]— chunk 0 rides inline in finalize.POST/media/[name]/[path]firstChunk+ DB-staged chunks, commit to GitHub. Accepts a 0-bytefirstChunkwhen path ends in.gitkeep(folder marker).Chunks are scoped to
(uploadId, userId); one user cannot read another's staging buffer.Limits
CHUNK_BYTESMAX_TOTAL_BYTESMAX_CHUNK_IDXceil(MAX_TOTAL_BYTES / CHUNK_BYTES) - 1)CHUNK_CONCURRENCYSTALE_CHUNK_AGE_MSCHUNK_BYTESandMAX_TOTAL_BYTESlive inlib/utils/upload-media.tsand are imported by both the client helper and the server routes.MAX_CHUNK_IDXis derived locally in the chunk handler.Storage
New table
upload_chunk(db/migrations/0013_upload_chunks.sql):(upload_id, chunk_idx)datastored asbytea(not base64) for size + perfuser_idandcreated_atfor ownership and TTL cleanupClient surface
lib/utils/upload-media.tsexportsuploadMediaChunked— a single source of truth used by every callsite that writes to media:Callers wired to it in this PR:
components/media/media-upload.tsx— media browser uploader (drag-drop, click)fields/core/rich-text/edit-component.tsx— inline image upload from the rich-text editorcomponents/folder-create.tsx(media branch) —.gitkeepfolder marker creationRefactors bundled
githubSaveFileextracted tolib/utils/github-save-file.ts. Previously inline infiles/[path]/route.ts; now shared between that route and the media POST.Rich-text image upload migrated from the JSON
POST /files/[path]route (type: "media", base64 in body) to the chunked endpoint. Per-upload cap rises from ~4.5 MB (Vercel JSON body limit) to 15 MB. Drops the localFileReader/base64 conversion..gitkeepfor media folders moved fromPOST /files/[path](type: "media", JSONcontent: "") to the chunked endpoint with a 0-byte inlinefirstChunk. The server special-cases.gitkeep: skips the extension check and allows an emptyfirstChunk.POST /files/[path]case "media"removed entirely. No live callers remain after points 2 and 3. The files endpoint now handles onlycontentandsettings.What did NOT change
POST /files/[path]forcontentandsettings(collection/file folder markers and the.pages.ymlconfig) — unchanged.Test plan
upload_chunk.gitkeepround-trips through the chunked endpointPOST /files/[path]