[Feature] support and harden native multimodal file handling#865
Conversation
Walkthrough该 PR 重构多模态输入链路:新增/改写多媒体 utils(MIME 推断、音频检测与转码、GIF 分帧、图片描述)、重写 read_files 工具、重构 audio/image 插件注入逻辑,并在多适配器移除对已弃用 additional_kwargs.images 的处理路径,同时为共享适配器与 OpenAI 适配器注入音频/文件处理配置。 Changes多模态重构主线
共享适配器与 OpenAI 映射
文档与配置
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 分钟 Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces multimodal support for MiMo models, specifically adding audio and image understanding capabilities. It includes logic for transcoding audio to MP3 using ffmpeg, detecting MIME types from file headers, and handling Base64 data URLs for OpenAI-compatible interfaces. Key feedback points out an inconsistency where the read_files tool lacks Silk audio decoding support compared to the main audio plugin, suggesting a unification of media utilities. Additionally, the getHeaderValue utility needs to be more robust to ensure full case-insensitivity when retrieving HTTP headers from plain objects.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
packages/service-multimodal/src/plugins/read_files.ts (1)
294-298: 💤 Low value如果
buildAudioContent返回未识别的类型,音频内容会被静默丢弃。当
audioContent既不是isMessageContentAudio也不是type === 'input_audio'时,内容不会被添加到content数组中。考虑添加日志警告以便在出现意外格式时进行调试。🔧 建议添加警告日志
if (isMessageContentAudio(audioContent as MessageContentComplex)) { content.push(audioContent as MessageContentComplex) } else if (audioContent.type === 'input_audio') { content.push(audioContent as MessageContentComplex) +} else { + logger.warn(`Unexpected audio content type: ${(audioContent as { type?: string }).type}`) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/plugins/read_files.ts` around lines 294 - 298, The current branch that handles audioContent in the if/else block using isMessageContentAudio and audioContent.type === 'input_audio' silently drops any other return type from buildAudioContent; update that block (the conditional around isMessageContentAudio and audioContent.type checks that pushes into content) to add a warning log (using the existing logger) when audioContent is neither a recognized MessageContentAudio nor type === 'input_audio', include the actual audioContent (or its type/shape) in the log message to aid debugging and ensure you still skip invalid entries.packages/service-multimodal/src/audio.ts (1)
31-34: ⚡ Quick win函数实现与
isMimoAudioModel完全重复。
isMimoImageModel和isMimoAudioModel的函数体完全相同,都使用同一个mimoModels集合。虽然这为 API 提供了清晰的语义,但违反了 DRY 原则。考虑以下两种方案之一:
- 如果 MIMO 模型确实同时支持音频和图像,可以提取一个通用的
isMimoModel函数,然后将isMimoAudioModel和isMimoImageModel作为其别名或包装器。- 如果未来可能需要不同的音频/图像模型集合,请在代码注释中说明当前共享集合的原因。
♻️ 方案1:提取通用函数
+function isMimoModel(model?: string): boolean { + if (!model) return false + return mimoModels.has(model.split('/').pop()?.toLowerCase() ?? '') +} + export function isMimoAudioModel(model?: string): boolean { - if (!model) return false - return mimoModels.has(model.split('/').pop()?.toLowerCase() ?? '') + return isMimoModel(model) } export function isMimoImageModel(model?: string): boolean { - if (!model) return false - return mimoModels.has(model.split('/').pop()?.toLowerCase() ?? '') + return isMimoModel(model) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/audio.ts` around lines 31 - 34, The two functions isMimoImageModel and isMimoAudioModel are identical and both check mimoModels; refactor by extracting a single helper isMimoModel(model?: string): boolean that performs the shared logic (use model.split('/').pop()?.toLowerCase() and mimoModels.has(...)) and then make isMimoImageModel and isMimoAudioModel thin wrappers that call isMimoModel (or export them as aliases), or if you prefer to keep separate sets in future, add a clear comment above isMimoImageModel/isMimoAudioModel explaining they intentionally share the mimoModels set today and where to change it later.packages/service-multimodal/src/media.ts (1)
14-16: 💤 Low valueMP3 检测逻辑可能产生误报。
单独使用
buffer[0] === 0xff来检测 MP3 格式较弱,因为许多二进制格式都可能以 0xFF 开头。MP3 帧同步字节是 0xFF,后面应跟特定的位模式(通常是 0xFB、0xFA 等)。虽然结合 ID3 标签检查提供了一定保护,但对于没有 ID3 标签的原始 MP3 流,这个检测可能会误判其他以 0xFF 开头的格式。
建议:如果这种误报率在实际使用中可接受,可以保持现状;否则应增强检测逻辑,例如检查
buffer[1]的高位。♻️ 可选:增强 MP3 检测
- if (header.startsWith('ID3') || buffer[0] === 0xff) { + if (header.startsWith('ID3') || + (buffer[0] === 0xff && buffer.length > 1 && (buffer[1] & 0xe0) === 0xe0)) { return 'audio/mpeg' }解释:MP3 帧同步字是 11 个 1 位(0xFFE),检查第二字节的高 3 位可以减少误报。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/media.ts` around lines 14 - 16, The current MP3 detection uses header.startsWith('ID3') || buffer[0] === 0xff which can false-positive on other formats; update the condition in packages/service-multimodal/src/media.ts to also validate the second byte's high bits (e.g., ensure buffer.length > 1 && buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0) so the check uses header and a proper MP3 frame-sync test (reference the existing header and buffer variables in the detection logic).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/service-multimodal/src/audio.ts`:
- Around line 83-89: Add a concrete MessageContentInputAudio type and a
type-guard, then return that type instead of using a double cast: declare export
type MessageContentInputAudio = { type: 'input_audio'; input_audio: { data:
string } } in packages/core/src/utils/langchain.ts, implement an
isMessageContentInputAudio(value): value is MessageContentInputAudio guard, and
update buildAudioContent (the block using isMimoAudioModel(model)) to construct
and return a MessageContentInputAudio instance directly rather than using "as
unknown as MessageContentComplex".
In `@packages/service-multimodal/src/index.ts`:
- Around line 103-107: The documentation comment lines under the "MiMo 音频理解" and
"MiMo 图片理解" blocks exceed the eslint max length; locate the long literal strings
containing "MiMo 音频理解" and "MiMo 图片理解" in
packages/service-multimodal/src/index.ts and break them into multiple shorter
string/template-literal lines (or use string concatenation) so no single source
line exceeds 160 characters; preserve the exact wording and punctuation while
splitting at sensible boundaries (clauses or after commas) to keep readability.
In `@packages/service-multimodal/src/media.ts`:
- Around line 8-13: The SILK detection includes a non-standard variant check
(buffer.subarray(1, 10).toString('latin1') === '#!SILK_V3') in addition to
header.startsWith('#!SILK_V3'); add a clear inline comment above these checks
explaining that this offset-1 marker is a non-standard variant observed in a
specific platform/app (name the platform/app where known), why we need to handle
it, and whether it can be removed in the future; make the same explanatory
comment in the isSilkAudio() function where the identical logic appears so both
detection sites (the header.startsWith check and the buffer.subarray(1, 10)
check) document the origin and necessity of the special-case handling.
---
Nitpick comments:
In `@packages/service-multimodal/src/audio.ts`:
- Around line 31-34: The two functions isMimoImageModel and isMimoAudioModel are
identical and both check mimoModels; refactor by extracting a single helper
isMimoModel(model?: string): boolean that performs the shared logic (use
model.split('/').pop()?.toLowerCase() and mimoModels.has(...)) and then make
isMimoImageModel and isMimoAudioModel thin wrappers that call isMimoModel (or
export them as aliases), or if you prefer to keep separate sets in future, add a
clear comment above isMimoImageModel/isMimoAudioModel explaining they
intentionally share the mimoModels set today and where to change it later.
In `@packages/service-multimodal/src/media.ts`:
- Around line 14-16: The current MP3 detection uses header.startsWith('ID3') ||
buffer[0] === 0xff which can false-positive on other formats; update the
condition in packages/service-multimodal/src/media.ts to also validate the
second byte's high bits (e.g., ensure buffer.length > 1 && buffer[0] === 0xFF &&
(buffer[1] & 0xE0) === 0xE0) so the check uses header and a proper MP3
frame-sync test (reference the existing header and buffer variables in the
detection logic).
In `@packages/service-multimodal/src/plugins/read_files.ts`:
- Around line 294-298: The current branch that handles audioContent in the
if/else block using isMessageContentAudio and audioContent.type ===
'input_audio' silently drops any other return type from buildAudioContent;
update that block (the conditional around isMessageContentAudio and
audioContent.type checks that pushes into content) to add a warning log (using
the existing logger) when audioContent is neither a recognized
MessageContentAudio nor type === 'input_audio', include the actual audioContent
(or its type/shape) in the log message to aid debugging and ensure you still
skip invalid entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 337f9324-0eff-43c9-a0b9-9712ce246abe
⛔ Files ignored due to path filters (1)
packages/service-multimodal/package.jsonis excluded by!**/*.json
📒 Files selected for processing (9)
packages/service-multimodal/README.mdpackages/service-multimodal/src/audio.tspackages/service-multimodal/src/index.tspackages/service-multimodal/src/media.tspackages/service-multimodal/src/plugins/audio.tspackages/service-multimodal/src/plugins/image.tspackages/service-multimodal/src/plugins/read_files.tspackages/service-multimodal/src/read_files_schema.tspackages/service-multimodal/tests/audio-mimo.test.ts
`detectAudioMimeType` checked only `buffer[0] === 0xFF` to identify MP3 frame sync, but JPEG files also start with 0xFF (FF D8). This caused every JPEG passed through `read_files` to be injected into the conversation as `audio/mpeg`, crashing model APIs that reject unsupported audio formats. Tighten the check to require the full MPEG sync word: `buffer[0] === 0xFF && (buffer[1] & 0xE0) === 0xE0`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5e1232e to
6ac2a42
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/adapter-openai/src/client.ts (1)
85-93:⚠️ Potential issue | 🟠 Major | ⚡ Quick win这段音频能力分支现在基本触发不到。
Line 71-76 仍然把所有包含
audio的模型都过滤掉了,所以gpt-4o-audio-*/gpt-audio-*会在到达这里之前就被移除。这样一来,新加的AudioInput能力和fileHandlingConfig都不会应用到自动拉取的 OpenAI 音频模型上。🛠️ 建议修复
.filter( (model) => !( model.includes('instruct') || - [ - 'whisper', - 'tts', - 'dall-e', - 'audio', - 'realtime' - ].some((keyword) => model.includes(keyword)) + ['whisper', 'tts', 'dall-e', 'realtime'].some( + (keyword) => model.includes(keyword) + ) || + (model.includes('audio') && + !supportAudioInput(model)) ) )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/adapter-openai/src/client.ts` around lines 85 - 93, 当前逻辑在前面把所有包含 "audio" 的模型过滤掉,导致后面在 capabilities 中通过 supportAudioInput(model) 添加 ModelCapabilities.AudioInput(以及 fileHandlingConfig)永远不会触发;请修改两处:一是调整或移除此前那段会剔除包含 "audio" 的过滤逻辑(不要在预筛选中丢弃 gpt-4o-audio-* / gpt-audio-*),二是确保 supportAudioInput(model) 正确识别 gpt-4o-audio-* 和 gpt-audio-* 并返回 true,这样在构建 capabilities(包含 ModelCapabilities.AudioInput)和应用 fileHandlingConfig 时这些自动拉取的 OpenAI 音频模型会被正确处理。
🧹 Nitpick comments (1)
packages/service-multimodal/src/utils.ts (1)
68-78: 💤 Low valuePrettier 格式化问题:移除多余括号。
静态分析工具提示第 74 行存在多余的括号。
🔧 建议修复
- return dot < 0 - ? null - : (FILE_EXTENSION_TO_MIME_TYPE[path.slice(dot)] ?? null) + return dot < 0 + ? null + : FILE_EXTENSION_TO_MIME_TYPE[path.slice(dot)] ?? null🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/utils.ts` around lines 68 - 78, The Prettier warning flags an unnecessary pair of parentheses in inferMimeTypeFromUrl; update the return expression in that function to remove the extra parentheses around the nullish-coalescing lookup so it directly returns FILE_EXTENSION_TO_MIME_TYPE[path.slice(dot)] ?? null (referencing inferMimeTypeFromUrl and FILE_EXTENSION_TO_MIME_TYPE to locate the code), keeping the rest of the try/catch logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/service-multimodal/src/plugins/audio.ts`:
- Around line 27-34: MIME_TO_EXT is missing mappings for formats present in
NATIVE_AUDIO_MIMES (specifically audio/aac and audio/webm), causing a fallback
to 'mp3' and mismatched filenames; update MIME_TO_EXT to include 'audio/aac' =>
'aac' and 'audio/webm' => 'webm' so the code that looks up MIME_TO_EXT (used
when determining output filenames/extensions) produces the correct extensions
rather than defaulting to 'mp3'.
In `@packages/service-multimodal/src/plugins/read_files.ts`:
- Around line 112-116: The current MIME selection assigns mime to detectedAudio
even when detectedAudio is null, causing valid declared audio types (e.g.,
audio/wav) to be lost; in the MIME resolution logic in read_files.ts (variables
declared, detectedAudio, and the call to detectAudioMimeType), change the
selection to prefer detectedAudio when it is non-null/defined, otherwise fall
back to declared (and only treat it as audio if declared?.startsWith('audio/'));
ensure mime is never set to null so downstream code won’t hit "Could not
determine MIME type".
- Around line 314-332: 在 _fetch 方法中,response.headers 不是 Headers 实例,不能用 .get() 获取
content-type,导致 contentType 永远为 null;修改 _fetch(使用 this.ctx.http 返回的
response)改为直接通过 response.headers['content-type'](或
response.headers['Content-Type'])来读取值并做防御性检查(小写/大小写兼容与可能为 undefined 的情况),然后将其赋给
contentType 并返回正确的 Buffer 与 contentType;确保不再对 response.headers 作 Headers
类型断言并保留现有超时/headers 配置。
In `@packages/shared-adapter/src/utils.ts`:
- Around line 387-397: 当前代码在处理音频内容时用 try { return await
fetchAudioContentPart(plugin, content) } catch { return null }
将所有异常静默吞掉,导致音频丢失难以排查;请修改 isMessageContentAudio 分支:不要在 catch 中直接 return null;改为
catch (err) { logger.error(`Failed to fetch audio part for model
${normalizedModel}`, err); throw err },并保留对 fetchAudioContentPart 返回 null
的显式检查(如果 fetchAudioContentPart 返回 null 则记录明确的 warning/error via
logger.warn/logger.error 并按预期返回 null
或返回一个明确的错误标记),以便调用方能看到失败原因;涉及符号:isMessageContentAudio, supportsAudio,
fetchAudioContentPart, logger.warn。
- Around line 696-698: The function audioMimeToFormat currently falls back to
'mp3' for unknown MIME types (audioMimeToFormat and AUDIO_MIME_TO_FORMAT), which
can produce an incorrect input_audio.format; change it to validate
mime.toLowerCase() against AUDIO_MIME_TO_FORMAT and throw a clear, explicit
Error (including the unsupported mime value) when there is no mapping instead of
returning 'mp3' so callers fail fast and avoid sending mismatched format/bytes
to the OpenAI API.
---
Outside diff comments:
In `@packages/adapter-openai/src/client.ts`:
- Around line 85-93: 当前逻辑在前面把所有包含 "audio" 的模型过滤掉,导致后面在 capabilities 中通过
supportAudioInput(model) 添加 ModelCapabilities.AudioInput(以及
fileHandlingConfig)永远不会触发;请修改两处:一是调整或移除此前那段会剔除包含 "audio" 的过滤逻辑(不要在预筛选中丢弃
gpt-4o-audio-* / gpt-audio-*),二是确保 supportAudioInput(model) 正确识别 gpt-4o-audio-*
和 gpt-audio-* 并返回 true,这样在构建 capabilities(包含 ModelCapabilities.AudioInput)和应用
fileHandlingConfig 时这些自动拉取的 OpenAI 音频模型会被正确处理。
---
Nitpick comments:
In `@packages/service-multimodal/src/utils.ts`:
- Around line 68-78: The Prettier warning flags an unnecessary pair of
parentheses in inferMimeTypeFromUrl; update the return expression in that
function to remove the extra parentheses around the nullish-coalescing lookup so
it directly returns FILE_EXTENSION_TO_MIME_TYPE[path.slice(dot)] ?? null
(referencing inferMimeTypeFromUrl and FILE_EXTENSION_TO_MIME_TYPE to locate the
code), keeping the rest of the try/catch logic unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a9729694-da41-4a1c-b630-a282324ed506
📒 Files selected for processing (14)
packages/adapter-claude/src/utils.tspackages/adapter-gemini/src/utils.tspackages/adapter-ollama/src/utils.tspackages/adapter-openai-like/src/client.tspackages/adapter-openai/src/client.tspackages/adapter-qwen/src/utils.tspackages/service-multimodal/README.mdpackages/service-multimodal/src/index.tspackages/service-multimodal/src/plugins/audio.tspackages/service-multimodal/src/plugins/image.tspackages/service-multimodal/src/plugins/read_files.tspackages/service-multimodal/src/utils.tspackages/shared-adapter/src/client.tspackages/shared-adapter/src/utils.ts
There was a problem hiding this comment.
🧹 Nitpick comments (2)
packages/service-multimodal/src/utils.ts (2)
80-84: 💤 Low value函数命名
normalizeMimeType与编码规范冲突。编码规范明确禁止
normalizeXXX模式的函数命名。此函数仅为单行逻辑,可在调用点内联:const mime = raw?.split(';')[0]?.trim()?.toLowerCase() || null如在多处调用且确需复用,建议改用更简洁的名称如
cleanMime或baseMime。As per coding guidelines: "Do NOT create
resolveXXX,normalizeXXX,ensureXXX,toSafeXXXfunctions—these are banned patterns".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/utils.ts` around lines 80 - 84, The function normalizeMimeType violates the naming rule banning normalizeXXX functions; either inline its one-line logic at call sites (replace uses with raw?.split(';')[0]?.trim()?.toLowerCase() || null) or rename and export it to an approved shorter name (e.g., cleanMime or baseMime) and update all references to that symbol (normalizeMimeType) to the new name, preserving the signature (string | null) and export. Ensure you update any imports/usages across the codebase and tests to reference the new symbol or the inlined expression.
362-369: 💤 Low value函数命名
ensureContentArray与编码规范冲突。编码规范禁止
ensureXXX模式。虽然此函数逻辑超过 5 行且在多处调用(如audio.ts),满足提取函数的条件,但命名可考虑调整为更具描述性的名称,如toContentArray或直接命名为contentAsArray。As per coding guidelines: "Do NOT create
resolveXXX,normalizeXXX,ensureXXX,toSafeXXXfunctions—these are banned patterns".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/service-multimodal/src/utils.ts` around lines 362 - 369, The function named ensureContentArray violates the banned `ensureXXX` naming pattern; rename the function to a descriptive allowed name (e.g., toContentArray or contentAsArray) while keeping the same signature (message: Message, fallbackText = '') and preserving its behavior, then update all call sites (for example in audio.ts and any other imports/exports) to use the new name and adjust exports if necessary so builds/tests pick up the rename.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/service-multimodal/src/utils.ts`:
- Around line 80-84: The function normalizeMimeType violates the naming rule
banning normalizeXXX functions; either inline its one-line logic at call sites
(replace uses with raw?.split(';')[0]?.trim()?.toLowerCase() || null) or rename
and export it to an approved shorter name (e.g., cleanMime or baseMime) and
update all references to that symbol (normalizeMimeType) to the new name,
preserving the signature (string | null) and export. Ensure you update any
imports/usages across the codebase and tests to reference the new symbol or the
inlined expression.
- Around line 362-369: The function named ensureContentArray violates the banned
`ensureXXX` naming pattern; rename the function to a descriptive allowed name
(e.g., toContentArray or contentAsArray) while keeping the same signature
(message: Message, fallbackText = '') and preserving its behavior, then update
all call sites (for example in audio.ts and any other imports/exports) to use
the new name and adjust exports if necessary so builds/tests pick up the rename.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99ad8af7-be73-4bf7-b18f-ea58308c6665
📒 Files selected for processing (5)
packages/adapter-openai/src/client.tspackages/service-multimodal/src/plugins/audio.tspackages/service-multimodal/src/plugins/read_files.tspackages/service-multimodal/src/utils.tspackages/shared-adapter/src/utils.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/adapter-openai/src/client.ts
- packages/shared-adapter/src/utils.ts
- packages/service-multimodal/src/plugins/read_files.ts
|
你自己找时间在本地环境测一下,yarn fast-build,没问题和我说,我合并 |
本地实测反馈已在本地 Koishi 实例上部署测试(通过 问题现象
模型反复重试(单对象、数组、file:// URI 等格式),均因 schema 不匹配而被拒绝。最终 agent 放弃并回退到 bash curl + ffmpeg 手动下载,但也无法完成音频理解。 测试环境
日志关键片段已回滚到部署前版本,实例恢复正常。 建议
|
|
行了,你重试一下 |
|
怎么会有这么猪鼻的模型? |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/service-multimodal/src/plugins/read_files.ts`:
- Around line 308-333: The code reports "file content has been added to the
conversation context" whenever native.length > 0 even if conversationId is
missing; fix by basing the success note on the actual injection outcome: compute
an injected boolean (e.g., injected = native.length > 0 && conversationId) and
use that to choose the note message, or enforce conversationId as a precondition
before attempting injection (wrap both the inject call and the success message
behind the same conversationId check); update references to native,
conversationId, this.ctx.chatluna.contextManager.inject, buildMultimodalMessage
and the JSON.stringify return so the note accurately reflects whether injection
occurred.
- Around line 225-249: When splitting GIFs into frames (parseGifToFrames) we
currently warn+break when remaining frames would exceed maxTotal but silently
drop the whole GIF if no frames were pushed; fix by detecting whether any frames
from that GIF were successfully pushed (track a local counter before/inside the
for-loop that calls pushNative) and if zero frames were pushed, record a failure
for that sourceUrl in the report (use the existing failure mechanism: call the
project’s failure helper such as pushFailure/report.files append with an error
entry or a dedicated pushFileFailure function) with a clear message like "GIF
frames exceed total size limit", instead of just warning, so the URL appears in
report.files as failed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f393673b-2a86-4c83-bce8-9c6d3b3f2767
📒 Files selected for processing (1)
packages/service-multimodal/src/plugins/read_files.ts
bf9a7db to
64d3eef
Compare
|
测试还有问题吗,没有问题我明天合并了 |
|
稍等 |
|
生产环境测试通过! |

This pr adds native multimodal file handling for MiMo/OpenAI-compatible models and hardens read_files/audio request conversion across adapters.
New Features
audio_urlcontent into OpenAI-compatibleinput_audioparts for MiMo and GPT audio models.read_filesimage, audio, video, and file content through native conversation context when the active model supports it.Bug fixes
audio.read_files.filespayloads from tool calls in the inlineread_filesschema.input_audioconversion, while preserving size-limit drop warnings.Other Changes
service-multimodalhelpers by merging MIME detection, GIF frame extraction, audio conversion, and content utilities intoutils.ts.read_filesschema and remove obsolete helper modules.yarn lint-fix;git pull --rebase origin v1-dev.