feat(fetch-url): support fetching images as multimodal content#1100
feat(fetch-url): support fetching images as multimodal content#1100bj456736 wants to merge 2 commits into
Conversation
- Extend UrlFetchKind with 'image' to distinguish image responses - Add image field to UrlFetchResult for base64-encoded image data - Update LocalFetchURLProvider to detect image/* content types and download binary data, converting to base64 - Update FetchURLTool to return ContentPart[] with image_url when fetching images, enabling multimodal models to 'see' the image - Update tool description to mention image support - Add test for image content kind detection Fixes: Fetch tool supports pulling images (P1 Backlog)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed6d2a1a4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| type: 'image_url', | ||
| imageUrl: { url: `data:${image.mimeType};base64,${image.base64}` }, |
There was a problem hiding this comment.
Gate fetched images on image-capable models
When the active model lacks image_in, FetchURL is still registered whenever a URL fetcher exists (checked packages/agent-core/src/agent/tool/index.ts, where only ReadMediaFile is capability-gated). Fetching any image/* URL now returns an image_url part, which providers serialize as image input on the next request, so text-only aliases can fail after a successful fetch instead of receiving an actionable tool error. Pass model capabilities into this tool or degrade/error before emitting the image_url part.
Useful? React with 👍 / 👎.
| return { | ||
| content: '', | ||
| kind: 'image', | ||
| image: { mimeType: contentType, base64 }, |
There was a problem hiding this comment.
Restrict fetched images to supported MIME types
This accepts every image/* content type and stores the full header as the MIME type. For URLs returning image/svg+xml, image/avif, or a header with parameters such as image/png; charset=binary, FetchURLTool emits a data: image URL that our Anthropic converter rejects because it only allows image/png, image/jpeg, image/gif, and image/webp (packages/kosong/src/providers/anthropic.ts:418-430). Restrict or sanitize the MIME type before returning kind: 'image', otherwise a successful fetch can make the following model request fail.
Useful? React with 👍 / 👎.
- Add PageMetadata to UrlFetchResult (url, mime, title) matching internal FetchResponse.page - Remove XML wrapping around image_url content, return clean image_url part directly - Add imageUrl.id from original URL for traceability - Update LocalFetchURLProvider to include page metadata for all content kinds - Update tests to match new return shape
|
Nice work adding multimodal image support to FetchURL — the size check before base64 encoding and keeping the new fields optional for backward compat are both good calls. One small thing: the |
Summary
This PR enables the FetchURL tool to download images and return them as multimodal content, allowing vision-capable models to analyze images fetched from URLs.
Changes
imagekind to distinguish image responses from text passthrough/extracted contentimagefield with{ mimeType, base64 }for image dataWhy this approach
Testing
Related