Native resumable upload for local files (fixes the #211 MCP timeout)#279
Merged
Conversation
…creds, dataset read-back)
…t mistaken for a spawn failure
…l_file replaces it
…nk can't exfiltrate its target
Pin the uvx package to a floor (galaxy-upload>=1.0.1) the way bin/loom.js already does for galaxy-mcp -- detectUploadFailure leans on this CLI's stderr format and its exit-0-on-ConnectionError behavior, so an unpinned spec could drift out from under us. Also pass --no-auto-decompress: galaxy-upload defaults to decompressing .gz/.zip on ingest, which silently turns an uploaded fastqsanger.gz into an uncompressed dataset; keep the bytes as the user sent them.
Needed for the new galaxy-upload-tus.ts module that replaces the subprocess-based upload path.
buildFetchPayload, tusUpload, and waitForDataset -- the portable core that will eventually move into galaxy-ops. No loom/pi imports; FileUrlStorage imported via namespace cast to work around a gap in tus-js-client's type defs (the runtime exports it, the .d.ts doesn't). 10 new tests covering the full happy path, error path, abort, and timeout behavior.
…rsing, and test coverage Move the pre-flight abort check after stream+finish() creation so cleanup is unconditional regardless of when abort fires. Parse the session id with the URL constructor instead of split() so trailing slashes and query strings don't yield the wrong segment. Add a comment on the waitForDataset loop clarifying what the timeout bounds. Add two missing tests: one for the no-session-id rejection path, and one that verifies the settled guard prevents a late onError from producing a second settlement after abort.
The tool now calls tusUpload() directly instead of shelling out to uvx galaxy-upload. After the TUS transfer it posts to /tools/fetch to trigger Galaxy ingest, then polls waitForDataset until a terminal state. Fallback paths handle a missing outputs[] (history-contents name-match) and a timed-out poll (return last-known state from the fetch response). Deleted galaxy-upload-runner.ts and removed the now-dead exports (buildGalaxyUploadArgs, detectUploadFailure, GALAXY_UPLOAD_PACKAGE, GalaxyUploadArgsOpts, UVX_MISSING). Test suite rewritten to mock the TUS core and galaxy-api instead of the runner -- 14 handler tests covering happy path, creds routing, abort, fetch rejection, read-back fallback, and all existing guard cases. 543 tests green, typecheck clean.
…, add two abort tests galaxyPost now maps AbortError to "Upload cancelled." instead of the misleading "Galaxy rejected the upload" message. The waitForDataset catch gets a comment explaining why it's intentionally non-fatal on abort/timeout -- the bytes are already uploaded and the dataset exists, so returning last-known state is the right move. Added a comment on the fileName vs realPath split so it doesn't read as an inconsistency. Two new tests cover the galaxyPost-abort and waitForDataset-abort paths. The bare os/path imports in the test file are used by resolveStoragePath's test and were left in place.
Skips automatically when GALAXY_URL / GALAXY_API_KEY / GALAXY_TEST_HISTORY_ID are absent -- so CI and offline vitest runs stay green. When the env vars are present it drives a real small-file upload end-to-end: TUS handshake -> /api/tools/fetch -> waitForDataset, asserting the dataset reaches a terminal state.
Adds a handler test asserting that a Galaxy ingest that lands in the terminal 'error' state does not produce a tool-level error -- the bytes got there, the ingest failed, and that distinction should be honest and visible to the model. Also drops a why-comment on the abort path in galaxy-upload-tus.ts: we omit shouldTerminate intentionally so the partial TUS session stays on the server and a retry can resume rather than re-uploading from scratch.
tus-js-client only stores resume state on its own -- it doesn't resume on start(), so the FileUrlStorage we set up was being recorded and then thrown away on retry, restarting every interrupted upload from zero. Wire findPreviousUploads()/resumeFromPreviousUpload() before start() so a re-run of the same file picks up from the server's offset. A stale stored session (404) needs no special handling here -- tus-js-client drops it and creates a fresh upload on its own since we always set endpoint. Verified live against test.galaxyproject.org: a 60MiB upload aborted at 20MiB resumes from 20MiB on re-run, and a planted dead session restarts cleanly without erroring.
…assertions Review follow-up. The resume wiring added an async findPreviousUploads() step before start(), so add a test that aborts while that lookup is in flight and asserts start() never fires (the settled guard handles it). Also count resumeFromPreviousUpload calls so the no-stored-partial test can prove it isn't called at all, not just that resumedFrom stayed undefined. Plus a comment that abort() before start() is a safe no-op.
53c9fab to
5651599
Compare
A TUS transfer can complete while Galaxy's ingest lands the dataset in a state it won't accept as a tool input: error, discarded, or failed_metadata -- the exact complement of Galaxy's Dataset.valid_input_states (enforced by tools/parameters/dataset_matcher.py). Returning uploaded:true there let the agent build on a dataset it can't use, so report those as a tool failure with state-specific guidance: failed_metadata is recoverable by setting the datatype, while error/discarded are not. Also add the missing 'empty' terminal state so a zero-byte upload reaches a terminal state instead of polling until the ingest timeout.
4b21f1c to
87645e7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Uploading a large local file with
galaxy_upload_filefails withMCP error -32001: Request timed out. The bytes stream through a single MCP request, and that request outlives the MCP SDK's hard-coded 60s per-request timeout, so the SDK aborts the call before Galaxy finishes ingesting. The URL variant (galaxy_upload_file_from_url) is fine -- Galaxy fetches server-side and the MCP request returns quickly. Nothing in loom can raise that timeout (it's baked into the SDK, below pi-mcp-adapter), so the only loom-side fix is to get the upload off the MCP request entirely. The aborted long request is also one of the things that trips the transport desync in #199 -- that's a separate problem, not addressed here.What
A loom-native
galaxy_upload_local_filetool that uploads with a resumable (TUS) transfer viatus-js-client, in its own in-process path -- no MCP request boundary, so no 60s wall. The broken MCPupload_fileis hidden withexcludeTools(URL upload stays), and the Galaxy context block nudges local uploads to the new tool.Flow: TUS upload to
/api/upload/resumable_upload->POST /api/tools/fetchto trigger ingest -> wait the dataset to a terminal state before returning, so the model gets a real result rather than a "queued" non-answer (cf. #210). Plus:findPreviousUploads/resumeFromPreviousUpload); a stale stored session restarts cleanly.The upload core (
galaxy-upload-tus.ts) is deliberately loom-free so it can move into@galaxyproject/galaxy-ops(galaxyproject/galaxy-mcp#61) when that lands and loom imports it directly (#202). This is the interim loom-side fix; the shared agent-ops package is the durable home. (It drops Python for uploads, butuvxis still needed for the galaxy-mcp server, so it doesn't remove Python from packaging on its own.)Branch-history note: the early commits explored a
uvx galaxy-uploadCLI stopgap; the native TS implementation replaced it and the runner was deleted -- the net diff is native-only.Testing
ok; 35 MiB multi-chunk ->ok; byte-exact binary round-trip (noto_posix_linescorruption); aborted-at-20 MiB upload resumes from 20 MiB on re-run; a planted dead session restarts cleanly.Addresses #211.