extract images of attachments uploaded during conversations by yzAiden · Pull Request #3217 · ModelEngine-Group/nexent

yzAiden · 2026-06-11T02:33:54Z

修复描述：因为返回值类型改变，导致原本接收单个值的变量报错，通过解包后修复
修复前：

修复后：

…_processing

yzAiden · 2026-06-15T07:54:20Z

实现的功能：提取对话时上传的附件的图片，并输出用户指定的图片

实现思路：
1.采用已有的图片提取功能
2.在文件分析工具中，识别文件类型，获取sdk层返回的图片信息，图片本身存入minio，图片元数据和纯文本合并后统一传给llm分析
3.llm提取图片url并是用图片分析工具分析，最终在对话部分输出用户指定的图片。来源部分可以显示图片元数据、图片部分可以显示提取到的图片

实现效果展示：

JasonW404 · 2026-06-24T04:02:39Z

            user_prompt=user_prompt
        )
        return result.content, truncation_percentage
+


detect_file_type 的参数名是 file_bytes: bytes，但调用处传入的 single_file 可能是 URL 字符串。bytes.startswith() 对字符串会抛 AttributeError。需要确认调用链始终传入 bytes，或添加类型检查。

JasonW404 · 2026-06-24T04:02:42Z

+            for idx, img_data in enumerate(images_chunks):
+                if not isinstance(img_data, dict):
+                    logger.warning(f"Skipping image entry at index {idx}: unexpected type {type(img_data)}")
+                    continue


图片上传到 MinIO 时没有错误处理。如果上传失败，整个文件处理流程会中断，用户连文本内容都拿不到。建议 try/except 包裹，失败的图片跳过并记录 warning，保证文本内容仍可用。

JasonW404 · 2026-06-24T04:02:45Z

            "task_id": None,
            "filename": filename,
            "text": full_text.strip(),
+            "images_info": [images_list_urls, image_info],


chunks_count 把图片条目也算进去了（len(text_chunks) + len(images_chunks)），但下游消费者可能认为 chunks_count 只代表文本分段数。建议拆分为 text_chunks_count 和 image_chunks_count。

YehongPan · 2026-06-24T05:17:30Z


    async def convert_office_to_pdf_impl(self, object_name: str, pdf_object_name: str) -> None:
-        """Full conversion pipeline: download -> convert -> upload -> validate -> cleanup.
+        """Full conversion pipeline: download → convert → upload → validate → cleanup.


[代码规范] chunks, _ = data_process.file_process(...) 中使用 _ 忽略了 images_info 返回值。如果后续需要处理上传文件中的图片信息，建议将返回值赋给有意义的变量名（如 images_info），并添加注释说明当前为何忽略该值。

WMC001 · 2026-06-24T07:20:25Z

The image extraction logic from PDF files is a useful addition. Please ensure edge cases (corrupted images, unsupported formats) are handled gracefully and covered by tests.

adapt_to_return_type_change

7edb3c9

yzAiden requested review from Dallas98 and WMC001 as code owners June 11, 2026 02:33

yzAiden added 7 commits June 11, 2026 11:00

modify test file

d654342

extract_attachment_images

10acafe

Merge branch 'develop' of https://github.com/yzAiden/nexent into file…

27ce29e

…_processing

modify test file

9c4c602

modify test files

6e8795c

modify test files

6379103

modify test file

e052c2f

yzAiden changed the title ~~adapt_to_return_type_change~~ extract images of attachments uploaded during conversations Jun 15, 2026

yzAiden added 2 commits June 15, 2026 16:18

add sdk layer dependencie

e8381f1

Merge branch 'develop' into file_processing

49335a0

JasonW404 reviewed Jun 24, 2026

View reviewed changes

YehongPan reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract images of attachments uploaded during conversations#3217

extract images of attachments uploaded during conversations#3217
yzAiden wants to merge 10 commits into
ModelEngine-Group:developfrom
yzAiden:file_processing

yzAiden commented Jun 11, 2026

Uh oh!

yzAiden commented Jun 15, 2026

Uh oh!

JasonW404 Jun 24, 2026

Uh oh!

JasonW404 Jun 24, 2026

Uh oh!

JasonW404 Jun 24, 2026

Uh oh!

YehongPan Jun 24, 2026

Uh oh!

WMC001 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yzAiden commented Jun 11, 2026

Uh oh!

yzAiden commented Jun 15, 2026

Uh oh!

JasonW404 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

JasonW404 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

JasonW404 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

YehongPan Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

WMC001 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants