Skip to content

feat(fetch-url): support fetching images as multimodal content#1100

Open
bj456736 wants to merge 2 commits into
MoonshotAI:mainfrom
bj456736:auto-pr-20260625-1800
Open

feat(fetch-url): support fetching images as multimodal content#1100
bj456736 wants to merge 2 commits into
MoonshotAI:mainfrom
bj456736:auto-pr-20260625-1800

Conversation

@bj456736

Copy link
Copy Markdown
Contributor

Summary

This PR enables the FetchURL tool to download images and return them as multimodal content, allowing vision-capable models to analyze images fetched from URLs.

Changes

  • UrlFetchKind: Added image kind to distinguish image responses from text passthrough/extracted content
  • UrlFetchResult: Added optional image field with { mimeType, base64 } for image data
  • LocalFetchURLProvider:
    • Detects image content types from HTTP responses
    • Downloads binary data via response.arrayBuffer()
    • Converts to base64 and returns kind: image
    • Respects the same 10 MiB size limit
  • FetchURLTool:
    • When image field is present, returns ContentPart[] with image_url type
    • Falls back to existing text behavior for non-image content
  • Tool description: Updated fetch-url.md to mention image support
  • Tests: Added test for image content kind detection in local-fetch-url.test.ts

Why this approach

  • Leverages existing multimodal infrastructure (ContentPart[] and image_url type)
  • Minimal changes to the tool interface — backward compatible for text URLs
  • Consistent with how ReadMediaFile handles image input
  • No changes needed to MoonshotFetchURLProvider — it falls back to local fetcher on failure

Testing

  • All 3172 existing tests pass + 1 new test for image detection
  • Verified image fetch returns correct base64 and mime type

Related

- Extend UrlFetchKind with 'image' to distinguish image responses
- Add image field to UrlFetchResult for base64-encoded image data
- Update LocalFetchURLProvider to detect image/* content types and
  download binary data, converting to base64
- Update FetchURLTool to return ContentPart[] with image_url when
  fetching images, enabling multimodal models to 'see' the image
- Update tool description to mention image support
- Add test for image content kind detection

Fixes: Fetch tool supports pulling images (P1 Backlog)
@changeset-bot

changeset-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: a97fad4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed6d2a1a4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +104 to +105
type: 'image_url',
imageUrl: { url: `data:${image.mimeType};base64,${image.base64}` },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate fetched images on image-capable models

When the active model lacks image_in, FetchURL is still registered whenever a URL fetcher exists (checked packages/agent-core/src/agent/tool/index.ts, where only ReadMediaFile is capability-gated). Fetching any image/* URL now returns an image_url part, which providers serialize as image input on the next request, so text-only aliases can fail after a successful fetch instead of receiving an actionable tool error. Pass model capabilities into this tool or degrade/error before emitting the image_url part.

Useful? React with 👍 / 👎.

return {
content: '',
kind: 'image',
image: { mimeType: contentType, base64 },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict fetched images to supported MIME types

This accepts every image/* content type and stores the full header as the MIME type. For URLs returning image/svg+xml, image/avif, or a header with parameters such as image/png; charset=binary, FetchURLTool emits a data: image URL that our Anthropic converter rejects because it only allows image/png, image/jpeg, image/gif, and image/webp (packages/kosong/src/providers/anthropic.ts:418-430). Restrict or sanitize the MIME type before returning kind: 'image', otherwise a successful fetch can make the following model request fail.

Useful? React with 👍 / 👎.

- Add PageMetadata to UrlFetchResult (url, mime, title) matching internal FetchResponse.page
- Remove XML wrapping around image_url content, return clean image_url part directly
- Add imageUrl.id from original URL for traceability
- Update LocalFetchURLProvider to include page metadata for all content kinds
- Update tests to match new return shape
@itxaiohanglover

Copy link
Copy Markdown

Nice work adding multimodal image support to FetchURL — the size check before base64 encoding and keeping the new fields optional for backward compat are both good calls. One small thing: the contentType header value is used directly as the data URL mime type, but it could include parameters like image/png; charset=utf-8. A quick contentType.split(';')[0].trim() would keep the data URL clean, though it's unlikely to cause issues in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants