Skip to content

server : auto-insert media marker in embedding / multimodal prompts#25093

Open
TheOneWhoWill wants to merge 2 commits into
ggml-org:masterfrom
TheOneWhoWill:master
Open

server : auto-insert media marker in embedding / multimodal prompts#25093
TheOneWhoWill wants to merge 2 commits into
ggml-org:masterfrom
TheOneWhoWill:master

Conversation

@TheOneWhoWill

Copy link
Copy Markdown

The /embedding (and /embeddings, /v1/embeddings) endpoints failed with "number of media markers in text (0) does not match number of bitmaps (1)" when passing multimodal data via the "content" object format.

The server initializes the mtmd context with a randomized media marker (via get_media_marker()), but process_mtmd_prompt() passed the raw prompt string to mtmd_tokenize() without ensuring it contained the required markers. The CLI (mtmd-cli.cpp) already handles this by auto-prepending markers, but the server did not.

Fix: query the actual marker from the mtmd context via mtmd_get_marker() and auto-insert one per file if the prompt lacks them.

Overview

Fixes #25088

Essentially calls to the /embedding endpoint were failing because the process_mtmd_prompt function in tools/server/server-common.cpp passes the raw text from a user's prompt without including the placeholder marker from mtmd_default_marker() and one is required for each attatched image. I added a simple check for existence and inserted 1 per image.

Requirements

Copilot AI review requested due to automatic review settings June 28, 2026 06:55
@TheOneWhoWill TheOneWhoWill requested a review from a team as a code owner June 28, 2026 06:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes multimodal /embedding requests in the server by ensuring the mtmd media marker is present in the text prompt before tokenization, aligning server behavior with the multimodal CLI and preventing marker/bitmap count mismatches.

Changes:

  • Query the active marker from the mtmd context (mtmd_get_marker()).
  • Auto-prepend media markers to the prompt before calling mtmd_tokenize() when markers are missing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/server/server-common.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread tools/server/server-common.cpp
@TheOneWhoWill TheOneWhoWill force-pushed the master branch 2 times, most recently from 33ac645 to e50014f Compare June 28, 2026 07:53
The /embedding (and /embeddings, /v1/embeddings) endpoints failed with
"number of media markers in text (0) does not match number of bitmaps (1)"
when passing multimodal data via the "content" object format.

The server initializes the mtmd context with a randomized media marker
(via get_media_marker()), but process_mtmd_prompt() passed the raw prompt
string to mtmd_tokenize() without ensuring it contained the required
markers. The CLI (mtmd-cli.cpp) already handles this by auto-prepending
markers, but the server did not.

Fix: query the actual marker from the mtmd context via mtmd_get_marker()
and auto-insert one per file if the prompt lacks them.

server: auto-insert missing media markers in process_mtmd_prompt

Fixes the /embedding endpoint when multimodal data is provided without
corresponding media markers in the prompt string. Counts existing markers
and prepends only the missing number so the count matches files.size().

Assisted-by: GitHub Copilot

Potential fix for pull request finding

This just makes the wording more accurate

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

server: auto-insert missing media markers in process_mtmd_prompt

Fixes the /embedding endpoint when multimodal data is provided without
corresponding media markers in the prompt string. Counts existing markers
and prepends only the missing number so the count matches files.size().

Assisted-by: GitHub Copilot
Comment thread tools/server/server-common.cpp Outdated
Forgot to remove merge conflict headers

Co-authored-by: AGawas <94751172+aln730@users.noreply.github.com>

@aln730 aln730 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. squash your commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen3-VL image embedding doesn't work

3 participants