Skip to content

feat: support URL-fetch uploads on the hosted transport#49

Merged
ChiragAgg5k merged 1 commit into
mainfrom
feat/hosted-url-upload
Jun 26, 2026
Merged

feat: support URL-fetch uploads on the hosted transport#49
ChiragAgg5k merged 1 commit into
mainfrom
feat/hosted-url-upload

Conversation

@ChiragAgg5k

Copy link
Copy Markdown
Member

Problem

On the hosted HTTP transport (mcp.appwrite.io/mcp), the MCP server runs in Appwrite Cloud with no access to the client's local filesystem. The two existing file-input forms both break there:

  • a path stringInputFile.from_path() reads the server's filesystem, so it can never see a file on the user's machine — it always fails on hosted.
  • inline {filename, content, encoding} → requires the model to emit the file's bytes as base64 in the tool-call args. A 180 KB PNG is ~240 KB of base64 the model can't faithfully reproduce → the upload corrupts or the request drops mid-generation. Only sub-~1 KB files work.

This is structural: MCP has no out-of-band binary channel for tool inputs, so bytes can only reach a remote server via the model's text or via a URL the server fetches.

Change

Add a server-side URL-fetch input form so the model can pass {"url": "https://..."} (or a bare http(s) string); the server downloads the bytes and uploads them as InputFile.from_bytes. The SDK chunks the bytes, so size isn't model-bound.

  • SSRF guard (_validate_fetch_url): http/https only; rejects hosts resolving to private / loopback / link-local / reserved / multicast / unspecified addresses (covers the cloud metadata IP 169.254.169.254, 10/8, 127/8, ::1, …); re-validates the final URL after redirects. 25 MB streamed cap (handles absent/lying Content-Length), 30 s timeout, 5 redirects.
  • Inline base64 kept but capped at 256 KB, with an error pointing the model to url for larger files.
  • Transport-aware: a module-level flag set once in build_mcp_server lets hosted reject local paths with guidance (steering to url or inline base64 — no stdio/self-host wording) and adds an upload sentence to the http instructions. stdio behavior is unchanged.
  • Schema: _input_file_schema() gains a url property and clarified descriptions.

No re-plumbing of the execute_registered_tool → _prepare_arguments → _coerce_argument chain (the _coerce_argument signature the tests assert on is untouched).

Tests

9 new unit tests in tests/unit/test_server.py (network mocked): URL object + bare-string, SSRF private-IP rejection, non-http scheme rejection, fetch + inline size caps, path rejection on http vs allowed on stdio, and http-instruction guidance. _UPLOAD_TRANSPORT is reset per test for isolation.

All four CI gates pass locally: ruff, black, unittest discover -s tests/unit (76 tests), and docker build.

Known limitation / follow-up

A file that exists only on the user's local disk still has to be reachable by a public URL first — structural to a remote MCP server. DNS-rebinding (resolve-then-reconnect) is an accepted residual gap; IP-pinning via a custom httpx transport is a possible follow-up.

On the hosted HTTP transport the server runs remotely with no access to
the client's filesystem, so local path uploads always fail and the only
fallback (inline base64) requires the model to emit the file's bytes —
infeasible for real images.

Add a server-side URL-fetch input form ({"url": "https://..."}): the
server downloads the bytes via httpx and uploads them as InputFile bytes,
so the file no longer has to travel through the model. The fetch is
SSRF-guarded (http/https only; rejects private/loopback/link-local/
reserved IPs incl. the cloud metadata endpoint; re-validates after
redirects; 25 MB cap, 30s timeout, 5 redirects).

Also:
- cap inline base64 at 256 KB with an error pointing to `url`
- gate local-path uploads to stdio; on hosted, reject with guidance to
  use `url` or inline base64 (no stdio/self-host wording)
- add an upload sentence to the http instructions
- add a `url` property to the InputFile JSON schema

Upload behavior is selected via a module-level flag set once in
build_mcp_server, so no re-plumbing of the execute call chain.
@ChiragAgg5k ChiragAgg5k merged commit 8200e8b into main Jun 26, 2026
5 checks passed
@ChiragAgg5k ChiragAgg5k deleted the feat/hosted-url-upload branch June 26, 2026 08:51
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a server-side URL-fetch upload path for the hosted HTTP transport, where the server has no access to the client's filesystem. It introduces an SSRF guard (_validate_fetch_url), a 25 MB streaming cap, a 256 KB inline-base64 cap, and transport-aware path rejection that steers hosted callers toward url or inline bytes.

  • server.py: New _fetch_input_file / _validate_fetch_url functions, plus _coerce_path / _coerce_inline_content helpers; _UPLOAD_TRANSPORT global set once at build_mcp_server time; build_instructions gains an upload sentence for the HTTP transport.
  • service.py: _input_file_schema gains a url property and updated descriptions to reflect the new input forms.
  • tests/unit/test_server.py: 9 new unit tests with mocked network; covers SSRF IP rejection, scheme rejection, size caps, path gating, and instruction content — but does not cover the redirect-to-private-IP bypass scenario.

Confidence Score: 3/5

The URL-fetch path introduces a real SSRF exposure: a redirect from a public host to a private IP reaches the internal network before the re-validation guard fires.

The core SSRF mitigation has a structural gap — follow_redirects=True means httpx resolves and connects to every hop before the post-redirect _validate_fetch_url call runs. An attacker who controls a public redirect target can route the hosted server to 169.254.169.254 or any private IP. This should be resolved before the feature goes live on the hosted transport.

src/mcp_server_appwrite/server.py — specifically the _fetch_input_file function and its redirect-handling configuration

Security Review

  • SSRF via HTTP redirect (server.py _fetch_input_file): httpx.Client is configured with follow_redirects=True. By the time execution enters the with client.stream() as resp: block, httpx has already sent HTTP requests to every URL in the redirect chain, including the final destination. The post-redirect call to _validate_fetch_url(str(resp.url)) prevents reading the response body but does not prevent the network request from reaching a private or link-local host (e.g. 169.254.169.254). A public server that issues a 302 to an internal IP bypasses the pre-request SSRF guard entirely.

Important Files Changed

Filename Overview
src/mcp_server_appwrite/server.py Adds URL-fetch upload path with SSRF guard, inline size cap, and transport-gated path rejection; the post-redirect re-validation fires after httpx has already connected to the final URL, leaving a redirect-to-private-IP bypass open.
src/mcp_server_appwrite/service.py Schema-only change: adds url property to _input_file_schema and updates descriptions to reflect the new URL-fetch and local-path guidance; no logic changed.
tests/unit/test_server.py Adds 9 new unit tests covering URL object/bare-string fetch, private-IP SSRF rejection, non-http scheme, stream size cap, inline size cap, path gating, and instruction content — but does not cover the redirect-to-private-IP bypass scenario.

Reviews (1): Last reviewed commit: "feat: support URL-fetch and hardened upl..." | Re-trigger Greptile

Comment on lines +427 to +436
with httpx.Client(
timeout=_FETCH_TIMEOUT_SECONDS,
follow_redirects=True,
max_redirects=_FETCH_MAX_REDIRECTS,
limits=httpx.Limits(max_connections=1),
) as client:
with client.stream("GET", url) as resp:
resp.raise_for_status()
# The final URL after redirects must also be public.
_validate_fetch_url(str(resp.url))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Post-redirect SSRF validation fires too late

client.stream("GET", url) with follow_redirects=True means httpx follows every redirect internally and opens a TCP connection to the final destination before the context manager body runs. By the time _validate_fetch_url(str(resp.url)) is called on line 436, the HTTP GET request to the final URL (which may be http://169.254.169.254/latest/meta-data/ or any private IP) has already been sent. The check blocks reading the response body but does not prevent the network request from crossing the trust boundary.

A redirect chain like https://legit.example.com/file → http://10.0.0.1/internal would bypass the pre-request validation entirely. The simplest fix is to disable automatic redirect following and either error on any redirect or validate each Location header before following manually.

Comment on lines +438 to +439
declared = resp.headers.get("content-length")
if declared is not None and declared.isdigit():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Content-Length skip when value contains a sign or spaces

declared.isdigit() returns False for values like " 1024", "+1024", or the rarely-seen negative sentinel -1, so those bypass the early-exit check. In practice a server lying about Content-Length would still be caught by the streaming cap on line 450, but the declared-size check silently skips any non-digit string and might miss a large file until the whole body has been partially buffered.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant