Skip to content

ACPAgent drops image content from user messages #2982

@xingyaoww

Description

@xingyaoww

Problem

The ACP protocol's prompt() method accepts multimodal content blocks including ImageContentBlock, AudioContentBlock, ResourceContentBlock, and EmbeddedResourceContentBlock:

prompt(
    prompt: list[TextContentBlock | ImageContentBlock | AudioContentBlock | ResourceContentBlock | EmbeddedResourceContentBlock],
    session_id: str,
    **kwargs: Any,
) -> PromptResponse

However, ACPAgent.step() currently only extracts TextContent from the user MessageEvent and sends it as a TextContentBlock via text_block(). Any image (or other non-text) content attached to the user message is silently dropped.

This is a pre-existing limitation on main — both before and after #2946 the code filters for TextContent only.

Expected Behavior

When a user message contains image content (e.g. via ImageContent blocks in the LLM message), ACPAgent should convert those to ImageContentBlock and include them in the prompt() call to the ACP server.

Suggested Fix

In _build_acp_prompt (or its caller), iterate over all content blocks in the message and map them to the corresponding ACP content block types (TextContentBlock, ImageContentBlock, etc.) instead of filtering to text only. The return type would change from str | None to list[ContentBlock] | None.

Context

Discovered during review of #2946.


This issue was created by an AI agent (OpenHands) on behalf of @xingyaoww.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions