Problem
The ACP protocol's prompt() method accepts multimodal content blocks including ImageContentBlock, AudioContentBlock, ResourceContentBlock, and EmbeddedResourceContentBlock:
prompt(
prompt: list[TextContentBlock | ImageContentBlock | AudioContentBlock | ResourceContentBlock | EmbeddedResourceContentBlock],
session_id: str,
**kwargs: Any,
) -> PromptResponse
However, ACPAgent.step() currently only extracts TextContent from the user MessageEvent and sends it as a TextContentBlock via text_block(). Any image (or other non-text) content attached to the user message is silently dropped.
This is a pre-existing limitation on main — both before and after #2946 the code filters for TextContent only.
Expected Behavior
When a user message contains image content (e.g. via ImageContent blocks in the LLM message), ACPAgent should convert those to ImageContentBlock and include them in the prompt() call to the ACP server.
Suggested Fix
In _build_acp_prompt (or its caller), iterate over all content blocks in the message and map them to the corresponding ACP content block types (TextContentBlock, ImageContentBlock, etc.) instead of filtering to text only. The return type would change from str | None to list[ContentBlock] | None.
Context
Discovered during review of #2946.
This issue was created by an AI agent (OpenHands) on behalf of @xingyaoww.
Problem
The ACP protocol's
prompt()method accepts multimodal content blocks includingImageContentBlock,AudioContentBlock,ResourceContentBlock, andEmbeddedResourceContentBlock:However,
ACPAgent.step()currently only extractsTextContentfrom the userMessageEventand sends it as aTextContentBlockviatext_block(). Any image (or other non-text) content attached to the user message is silently dropped.This is a pre-existing limitation on
main— both before and after #2946 the code filters forTextContentonly.Expected Behavior
When a user message contains image content (e.g. via
ImageContentblocks in the LLM message),ACPAgentshould convert those toImageContentBlockand include them in theprompt()call to the ACP server.Suggested Fix
In
_build_acp_prompt(or its caller), iterate over all content blocks in the message and map them to the corresponding ACP content block types (TextContentBlock,ImageContentBlock, etc.) instead of filtering to text only. The return type would change fromstr | Nonetolist[ContentBlock] | None.Context
Discovered during review of #2946.
This issue was created by an AI agent (OpenHands) on behalf of @xingyaoww.