diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md new file mode 100644 index 00000000..7e8bd25d --- /dev/null +++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md @@ -0,0 +1,82 @@ +### 认领者 GitHub ID +megemini + +### 赛题信息 + +- **进阶任务序号**:#15 +- **赛题名称**:基于天数智芯硬件与文心多模态模型的创新应用 +- **关联厂商**:天数 + +### 本周工作 + +1. **RFC 文档** + + - 已经完成 RFC 文档 + - AI Studio 地址:https://aistudio.baidu.com/project/edit/10221576 + +2. **代码实现** + + - 已经完成 AI Studio 项目的 notebook + - 已经创建了双卡的天数环境 + +3. **README** + + - 可以参考 AI Studio 项目的 notebook + +4. **演示视频/截图** + + - 待完成 + +5. **问题与解决** + + - 问题:AI Studio 的 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle + + 现在有一个很奇怪的问题,AI Studio 的 notebook 中无法 `正常` 调用 ERNIE-4.5-0.3B-Paddle 模型。模型可以正常的运行,但是,输出是 `答非所问` 。 + + 请看下面的截图,我将 PaddleOCR-VL-1.5 识别的结果手动放入到 prompt 中: + + ![images/cli_prompt.png](images/cli_prompt.png) + + 使用命令行调用模型,输出是正常的: + + ![images/cli_ok.png](images/cli_ok.png) + + 但是,如果放到 notebook 中,输出就是一长串的空白(空格和回车)! + + 我手动将 notebook 中的 prompt 修改为 `你是谁` 测试模型的输出: + + ![images/notebook_prompt.png](images/notebook_input.png) + + 输出是一段奇怪的东西: + + ![images/notebook_output.png](images/notebook_output.png) + + 有时候还会给我输出一段完形填空题。 + + 我尝试在 notebook 中进行函数调用,也尝试使用子进行调用,都不行! + + 现在附上 notebook 文件 `medical_pipeline_20260503.ipynbS`,可以直接执行。 + + 另外,还发现个问题,在 AI Studio 中,显存有时无法释放,可以看到截图中,即便什么都没有,现在也被占用了 45% 的显存。我不确定是 AI Studio 的问题,还是 Fastdeploy 配合天数硬件的问题。 请帮忙看一下。 + + - 问题:天数的双卡的框架开发环境,只有命令行模式,不能使用 notebook,也不能进行项目公开 + + 现在的解决方案是,先在单卡环境中调通 notebook,然后再双卡环境中验证 pipeline 是否能够走通。 + +### 下周计划 + +1. 调试 notebook +2. 调试双卡环境 + +### 当前阻塞(无则填"无") + +- 解决 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle 模型的问题 + +### 交付物进展 + +| 交付物 | 状态 | 备注 | +|--------|:----:|------| +| RFC 文档 | ✅ 已完成 | - | +| 代码实现 | 🔄 | | +| README | 🔄 | - | +| 演示视频/截图 |🔄 | - | \ No newline at end of file diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png new file mode 100644 index 00000000..d1b3e20f Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png differ diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png new file mode 100644 index 00000000..d2efd019 Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png differ diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png new file mode 100644 index 00000000..84853809 Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png differ diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png new file mode 100644 index 00000000..d2f6e2af Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png differ diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb new file mode 100644 index 00000000..ec34bd1e --- /dev/null +++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb @@ -0,0 +1,1512 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a1b2c3d4", + "metadata": {}, + "source": [ + "# 药品说明书智能识别与语音播报系统\n", + "\n", + "## 项目说明\n", + "\n", + "针对药品说明书字体太小、老年人看不清读不懂的问题,本项目通过以下三个步骤,将药品说明书中的重点内容识别提取并语音播报:\n", + "\n", + "1. **OCR 识别**:使用 PaddleOCR-VL-1.5 模型对药品说明书图片进行文字识别\n", + "2. **大模型整理**:使用 ERNIE-4.5 大模型对识别的文字进行整理,提取关键信息\n", + "3. **语音合成播报**:使用 PaddleSpeech 语音合成模型将整理后的文字转为音频文件\n", + "\n", + "### 提取的关键信息包括:\n", + "1. 药品名称\n", + "2. 药品适应症\n", + "3. 药品的用法与用量\n", + "4. 药品的禁忌\n", + "5. 药品的不良反应\n", + "\n", + "### 技术栈:\n", + "- OCR: PaddleOCR-VL-1.5\n", + "- LLM: ERNIE-4.5-0.3B-Paddle\n", + "- TTS: PaddleSpeech bert-base-chinese\n", + "\n", + "### 内存优化(子进程模式):\n", + "为确保内存完全释放,本系统采用**子进程模式**运行每个模型:\n", + "- 每个模型在独立的子进程中加载和执行\n", + "- 子进程完成后自动销毁,确保内存完全释放\n", + "- 主进程仅负责数据传递和流程控制,不加载模型\n", + "- 例如:OCR 在子进程运行,完成后子进程销毁,再启动 LLM 子进程\n", + "\n", + "#### 目录:\n", + "- [模型下载与检查](#模型下载与检查)\n", + "- [生成参数设置](#生成参数设置)\n", + "- [OCR 模块](#OCR-模块)\n", + "- [LLM 模块](#LLM-模块)\n", + "- [TTS 模块](#TTS-模块)\n", + "- [管线编排与模型管理](#管线编排与模型管理)\n", + "- [主流程](#主流程)\n", + "- [Gradio 交互界面](#Gradio-交互界面)" + ] + }, + { + "cell_type": "markdown", + "id": "088dfe7b-8df9-47d3-b94d-70db4eb1a2a9", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:08:00.811267Z", + "iopub.status.busy": "2026-05-03T06:08:00.811134Z" + } + }, + "source": [ + "%pip install -r requirements.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "bdb8d7d5", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:11:06.332147Z", + "iopub.status.busy": "2026-05-03T06:11:06.332023Z", + "iopub.status.idle": "2026-05-03T06:11:11.425006Z", + "shell.execute_reply": "2026-05-03T06:11:11.423507Z", + "shell.execute_reply.started": "2026-05-03T06:11:06.332128Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Found existing installation: opencc-python-reimplemented 0.1.6\r\n", + "Uninstalling opencc-python-reimplemented-0.1.6:\r\n", + " Successfully uninstalled opencc-python-reimplemented-0.1.6\r\n", + "Note: you may need to restart the kernel to use updated packages.\r\n", + "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n", + "Collecting opencc-python-reimplemented==0.1.6\r\n", + " Using cached opencc_python_reimplemented-0.1.6-py2.py3-none-any.whl\r\n", + "Installing collected packages: opencc-python-reimplemented\r\n", + "Successfully installed opencc-python-reimplemented-0.1.6\r\n", + "Note: you may need to restart the kernel to use updated packages.\r\n", + "Found existing installation: aistudio-sdk 0.3.8\r\n", + "Uninstalling aistudio-sdk-0.3.8:\r\n", + " Successfully uninstalled aistudio-sdk-0.3.8\r\n", + "Note: you may need to restart the kernel to use updated packages.\r\n", + "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n", + "Collecting aistudio-sdk==0.3.8\r\n", + " Using cached http://mirrors.baidubce.com/pypi/packages/cb/77/cd71a481bb7a76b0e9d0b6bf47711c627b1dd079001ea246893f19a9d04c/aistudio_sdk-0.3.8-py3-none-any.whl (62 kB)\r\n", + "Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (7.2.1)\r\n", + "Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (2.32.5)\r\n", + "Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (4.67.1)\r\n", + "Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (0.9.59)\r\n", + "Requirement already satisfied: prettytable in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (3.17.0)\r\n", + "Requirement already satisfied: click in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (8.3.1)\r\n", + "Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (3.23.0)\r\n", + "Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.0.0)\r\n", + "Requirement already satisfied: six>=1.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.17.0)\r\n", + "Requirement already satisfied: wcwidth in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from prettytable->aistudio-sdk==0.3.8) (0.2.14)\r\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.4.4)\r\n", + "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.11)\r\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in ./external-libraries/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (1.26.20)\r\n", + "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (2026.1.4)\r\n", + "Installing collected packages: aistudio-sdk\r\n", + "\u001b[33m WARNING: The script aistudio is installed in '/home/aistudio/external-libraries/bin' which is not on PATH.\r\n", + " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\r\n", + "\u001b[0mSuccessfully installed aistudio-sdk-0.3.8\r\n", + "Note: you may need to restart the kernel to use updated packages.\r\n" + ] + } + ], + "source": [ + "%pip uninstall opencc-python-reimplemented -y\n", + "%pip install opencc-python-reimplemented==0.1.6\n", + "%pip uninstall aistudio-sdk -y\n", + "%pip install aistudio-sdk==0.3.8\n", + "# PaddleSpeech use 0.2.6 with should be patched" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "71b16cd1", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:11:11.426579Z", + "iopub.status.busy": "2026-05-03T06:11:11.426259Z", + "iopub.status.idle": "2026-05-03T06:11:11.436812Z", + "shell.execute_reply": "2026-05-03T06:11:11.435719Z", + "shell.execute_reply.started": "2026-05-03T06:11:11.426550Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File already patched.\r\n" + ] + } + ], + "source": [ + "\"\"\"Patch script to fix aistudio_sdk import in paddlenlp.\n", + "\n", + "Uses importlib.util.find_spec to locate paddlenlp WITHOUT importing it,\n", + "so this can be run before paddlenlp is imported to prevent the ImportError.\n", + "\"\"\"\n", + "\n", + "import importlib.util\n", + "import os\n", + "import subprocess\n", + "\n", + "\n", + "def _find_paddlenlp_dir():\n", + " # Method 1: find_spec (no import, just metadata)\n", + " spec = importlib.util.find_spec(\"paddlenlp\")\n", + " if spec and spec.origin:\n", + " return os.path.dirname(spec.origin)\n", + "\n", + " # Method 2: pip show as fallback\n", + " result = subprocess.run(\n", + " [\"pip\", \"show\", \"paddlenlp\"],\n", + " capture_output=True, text=True,\n", + " )\n", + " for line in result.stdout.splitlines():\n", + " if line.startswith(\"Location:\"):\n", + " return os.path.join(line.split(\":\", 1)[1].strip(), \"paddlenlp\")\n", + "\n", + " raise RuntimeError(\"Cannot locate paddlenlp installation directory\")\n", + "\n", + "\n", + "def patch_aistudio_utils():\n", + " pkg_dir = _find_paddlenlp_dir()\n", + " target_file = os.path.join(pkg_dir, \"transformers\", \"aistudio_utils.py\")\n", + "\n", + " if not os.path.isfile(target_file):\n", + " raise FileNotFoundError(f\"Target file not found: {target_file}\")\n", + "\n", + " old_line = \"from aistudio_sdk.hub import download\"\n", + " new_line = \"from aistudio_sdk import snapshot_download as download\"\n", + "\n", + " with open(target_file, \"r\", encoding=\"utf-8\") as f:\n", + " content = f.read()\n", + "\n", + " if old_line not in content:\n", + " if new_line in content:\n", + " print(\"File already patched.\")\n", + " else:\n", + " print(f\"Target import not found in {target_file}\")\n", + " return\n", + "\n", + " patched = content.replace(old_line, new_line)\n", + "\n", + " with open(target_file, \"w\", encoding=\"utf-8\") as f:\n", + " f.write(patched)\n", + "\n", + " print(f\"Patched: {target_file}\")\n", + " print(f\" {old_line} => {new_line}\")\n", + "\n", + "\n", + "patch_aistudio_utils()\n" + ] + }, + { + "cell_type": "markdown", + "id": "c9d0e1f2", + "metadata": {}, + "source": [ + "## 模型下载与检查\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "从 AIStudio 下载三个模型(如果已存在则跳过),并检查模型文件是否完整。\n", + "\n", + "> **注意**:此步骤仅下载和检查模型,**不加载模型到内存**。模型将在管线运行时按需加载。" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a3b4c5d6", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:12.765664Z", + "iopub.status.busy": "2026-05-03T06:14:12.765530Z", + "iopub.status.idle": "2026-05-03T06:14:12.771812Z", + "shell.execute_reply": "2026-05-03T06:14:12.770795Z", + "shell.execute_reply.started": "2026-05-03T06:14:12.765642Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OCR 模型已存在: baidu/PaddleOCR-VL-1.5,跳过下载\r\n", + "LLM 模型已存在: baidu/ERNIE-4.5-0.3B-Paddle,跳过下载\r\n", + "TTS 模型将在首次使用时自动下载\r\n" + ] + } + ], + "source": [ + "from pathlib import Path\n", + "import subprocess\n", + "\n", + "# --- OCR 模型 ---\n", + "ocr_model_dir = Path(\"baidu/PaddleOCR-VL-1.5\")\n", + "\n", + "if not ocr_model_dir.exists():\n", + " subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/PaddleOCR-VL-1.5\", \"--local_dir\", str(ocr_model_dir)], check=True)\n", + " print(f\"OCR 模型已下载到: {ocr_model_dir}\")\n", + "else:\n", + " print(f\"OCR 模型已存在: {ocr_model_dir},跳过下载\")\n", + "\n", + "# --- LLM 模型 ---\n", + "llm_model_dir = Path(\"baidu/ERNIE-4.5-0.3B-Paddle\")\n", + "\n", + "if not llm_model_dir.exists():\n", + " subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/ERNIE-4.5-0.3B-Paddle\", \"--local_dir\", str(llm_model_dir)], check=True)\n", + " print(f\"LLM 模型已下载到: {llm_model_dir}\")\n", + "else:\n", + " print(f\"LLM 模型已存在: {llm_model_dir},跳过下载\")\n", + "\n", + "# --- TTS 模型 ---\n", + "# PaddleSpeech bert-base-chinese 会在首次使用时自动下载\n", + "print(\"TTS 模型将在首次使用时自动下载\")" + ] + }, + { + "cell_type": "markdown", + "id": "e7f8a9b0", + "metadata": {}, + "source": [ + "## 生成参数设置\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "设置模型的 `max_new_tokens` 参数,控制每个模型生成的最大 token 数量:\n", + "- **OCR max_new_tokens**:PaddleOCR-VL 识别文字时的最大生成长度,说明书内容多时建议调大\n", + "- **LLM max_new_tokens**:ERNIE 提取信息时的最大生成长度,需要更详细整理时可调大" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "c1d2e3f4", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:12.773035Z", + "iopub.status.busy": "2026-05-03T06:14:12.772880Z", + "iopub.status.idle": "2026-05-03T06:14:12.777084Z", + "shell.execute_reply": "2026-05-03T06:14:12.775954Z", + "shell.execute_reply.started": "2026-05-03T06:14:12.773015Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "OCR max_new_tokens: 200\r\n", + "LLM max_new_tokens: 200\r\n" + ] + } + ], + "source": [ + "# OCR 最大生成 token 数(说明书内容多时建议调大,默认 5120)\n", + "ocr_max_new_tokens = 200\n", + "\n", + "# LLM 最大生成 token 数(需要更详细整理时可调大,默认 1024)\n", + "llm_max_new_tokens = 200\n", + "\n", + "print(f\"OCR max_new_tokens: {ocr_max_new_tokens}\")\n", + "print(f\"LLM max_new_tokens: {llm_max_new_tokens}\")" + ] + }, + { + "cell_type": "markdown", + "id": "md_ocr_module", + "metadata": {}, + "source": [ + "## OCR 模块\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "包含图片分割、OCR 子进程工作函数,以及可独立执行的 `ocr_step`。\n", + "\n", + "**子进程模式**:OCR 模型在独立子进程中加载和执行,完成后子进程自动销毁,确保内存完全释放。" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "code_ocr_module", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:12.777863Z", + "iopub.status.busy": "2026-05-03T06:14:12.777725Z", + "iopub.status.idle": "2026-05-03T06:14:12.993437Z", + "shell.execute_reply": "2026-05-03T06:14:12.992174Z", + "shell.execute_reply.started": "2026-05-03T06:14:12.777846Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ OCR 模块定义完成 (子进程模式)\r\n" + ] + } + ], + "source": [ + "import base64\n", + "import gc\n", + "import io\n", + "import logging\n", + "import math\n", + "import time\n", + "import multiprocessing as mp\n", + "from multiprocessing import Process, Queue\n", + "\n", + "from PIL import Image\n", + "\n", + "logger = logging.getLogger(\"drug_ocr\")\n", + "\n", + "\n", + "# ---- 图片分割 ----\n", + "\n", + "def split_image(image, num_splits=4, overlap_ratio=0.1):\n", + " \"\"\"Split an image into num_splits parts (NxN grid) with overlap.\"\"\"\n", + " grid_size = int(math.sqrt(num_splits))\n", + " if grid_size * grid_size != num_splits:\n", + " raise ValueError(f\"num_splits must be a perfect square (e.g. 4, 9, 16), got: {num_splits}\")\n", + "\n", + " w, h = image.size\n", + " cell_w = w / grid_size\n", + " cell_h = h / grid_size\n", + " overlap_w = cell_w * overlap_ratio\n", + " overlap_h = cell_h * overlap_ratio\n", + "\n", + " sub_images = []\n", + " for row in range(grid_size):\n", + " for col in range(grid_size):\n", + " left = max(0, col * cell_w - overlap_w)\n", + " upper = max(0, row * cell_h - overlap_h)\n", + " right = min(w, (col + 1) * cell_w + overlap_w)\n", + " lower = min(h, (row + 1) * cell_h + overlap_h)\n", + " sub_img = image.crop((int(left), int(upper), int(right), int(lower)))\n", + " sub_images.append(sub_img)\n", + "\n", + " return sub_images\n", + "\n", + "\n", + "# ---- OCR 子进程工作函数 ----\n", + "\n", + "def ocr_worker_process(ocr_model_dir, image_data_list, max_new_tokens, result_queue):\n", + " \"\"\"Worker function for OCR subprocess - loads model, performs OCR, returns result.\"\"\"\n", + " try:\n", + " import time\n", + " import base64\n", + " import io\n", + " from PIL import Image\n", + " from fastdeploy import LLM, SamplingParams\n", + "\n", + " # Load OCR model\n", + " print(\"[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\")\n", + " start = time.perf_counter()\n", + " ocr_model = LLM(\n", + " model=ocr_model_dir,\n", + " tensor_parallel_size=1,\n", + " max_model_len=8192,\n", + " block_size=16,\n", + " quantization=\"wint8\",\n", + " graph_optimization_config={\"use_cudagraph\": False},\n", + " )\n", + " elapsed = time.perf_counter() - start\n", + " print(f\"[OCR Worker] OCR 模型加载完成, 耗时: {elapsed:.2f}s\")\n", + "\n", + " # Process each image\n", + " all_ocr_texts = []\n", + " for i, img_bytes in enumerate(image_data_list):\n", + " image = Image.open(io.BytesIO(img_bytes)).convert(\"RGB\")\n", + " print(f\"[OCR Worker] 识别图片 {i+1}/{len(image_data_list)}, 尺寸: {image.size}\")\n", + "\n", + " # Prepare image for OCR\n", + " buf = io.BytesIO()\n", + " image.save(buf, format=\"PNG\")\n", + " base64_image = base64.b64encode(buf.getvalue()).decode(\"utf-8\")\n", + " image_url = f\"data:image/png;base64,{base64_image}\"\n", + "\n", + " prompts = [{\n", + " \"messages\": [{\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n", + " {\"type\": \"text\", \"text\": \"OCR:\"},\n", + " ],\n", + " }]\n", + " }]\n", + " sampling_params = SamplingParams(\n", + " temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n", + " )\n", + " outputs = ocr_model.generate(prompts, sampling_params)\n", + " response = outputs[0].outputs.text\n", + " all_ocr_texts.append(response)\n", + " print(f\"[OCR Worker] 图片 {i+1} 识别完成, 文字长度: {len(response)}\")\n", + "\n", + " # Combine results\n", + " combined_text = \"\\n\\n\".join(all_ocr_texts)\n", + " print(f\"[OCR Worker] 全部识别完成, 总文字长度: {len(combined_text)}\")\n", + "\n", + " # Put result in queue\n", + " result_queue.put((\"success\", combined_text))\n", + "\n", + " # Clean up\n", + " del ocr_model\n", + " import gc\n", + " gc.collect()\n", + " print(\"[OCR Worker] OCR 模型已释放\")\n", + "\n", + " except Exception as e:\n", + " import traceback\n", + " result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n", + "\n", + "\n", + "# ---- 独立 OCR 步骤 (使用子进程) ----\n", + "\n", + "def ocr_step(\n", + " ocr_model_dir,\n", + " image_path,\n", + " enable_split=True,\n", + " num_splits=4,\n", + " overlap_ratio=0.1,\n", + " max_new_tokens=5120,\n", + "):\n", + " \"\"\"Execute the OCR step in a subprocess: load image, optionally split, and run OCR.\"\"\"\n", + " step_start = time.perf_counter()\n", + " logger.info(\"[OCR Step] 加载图片...\")\n", + " image = Image.open(image_path).convert(\"RGB\")\n", + " logger.info(\"[OCR Step] 图片加载完成, 尺寸: %s\", image.size)\n", + "\n", + " if enable_split:\n", + " logger.info(\"[OCR Step] 图片分割 (num_splits=%d, overlap=%.2f)...\", num_splits, overlap_ratio)\n", + " sub_images = split_image(image, num_splits=num_splits, overlap_ratio=overlap_ratio)\n", + " ocr_images = [image] + sub_images\n", + " logger.info(\"[OCR Step] 图片分割完成, 原始1张 + 分割%d张 = 共%d张\", len(sub_images), len(ocr_images))\n", + " else:\n", + " logger.info(\"[OCR Step] 跳过图片分割\")\n", + " ocr_images = [image]\n", + "\n", + " # Serialize images to bytes for subprocess\n", + " image_data_list = []\n", + " for img in ocr_images:\n", + " buf = io.BytesIO()\n", + " img.save(buf, format=\"PNG\")\n", + " image_data_list.append(buf.getvalue())\n", + "\n", + " # Create subprocess for OCR\n", + " logger.info(\"[OCR Step] 启动 OCR 子进程...\")\n", + " result_queue = Queue()\n", + " ocr_process = Process(\n", + " target=ocr_worker_process,\n", + " args=(str(ocr_model_dir), image_data_list, max_new_tokens, result_queue)\n", + " )\n", + " ocr_process.start()\n", + "\n", + " # Wait for result\n", + " status, result = result_queue.get()\n", + " ocr_process.join()\n", + " ocr_process.close()\n", + "\n", + " if status == \"error\":\n", + " logger.error(\"[OCR Step] OCR 子进程执行失败: %s\", result)\n", + " raise RuntimeError(f\"OCR subprocess failed: {result}\")\n", + "\n", + " combined_ocr_text = result\n", + " logger.info(\"[OCR Step] OCR 识别全部完成, 总文字长度: %d, 耗时: %.2fs\", len(combined_ocr_text), time.perf_counter() - step_start)\n", + "\n", + " return {\"ocr_text\": combined_ocr_text, \"ocr_images\": ocr_images}\n", + "\n", + "print(\"✅ OCR 模块定义完成 (子进程模式)\")" + ] + }, + { + "cell_type": "markdown", + "id": "md_llm_module", + "metadata": {}, + "source": [ + "## LLM 模块\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "包含文本清洗(`clean_for_tts`)、LLM 子进程工作函数,以及可独立执行的 `llm_step`。\n", + "\n", + "**子进程模式**:LLM 模型在独立子进程中加载和执行,完成后子进程自动销毁,确保内存完全释放。" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "code_llm_module", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:12.994715Z", + "iopub.status.busy": "2026-05-03T06:14:12.994434Z", + "iopub.status.idle": "2026-05-03T06:14:13.009434Z", + "shell.execute_reply": "2026-05-03T06:14:13.008415Z", + "shell.execute_reply.started": "2026-05-03T06:14:12.994692Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ LLM 模块定义完成 (子进程模式)\r\n" + ] + } + ], + "source": [ + "import re\n", + "from multiprocessing import Process, Queue\n", + "\n", + "\n", + "# ---- 文本清洗 ----\n", + "\n", + "def clean_for_tts(text):\n", + " \"\"\"Clean text for TTS synthesis by removing emojis and markdown formatting.\"\"\"\n", + " # Remove emojis (Unicode ranges for common emojis)\n", + " # NOTE: Must avoid ranges that overlap with CJK characters (U+4E00-U+9FFF)\n", + " text = re.sub(\n", + " r\"[\\U0001F600-\\U0001F64F\" # emoticons\n", + " r\"\\U0001F300-\\U0001F5FF\" # symbols & pictographs\n", + " r\"\\U0001F680-\\U0001F6FF\" # transport & map\n", + " r\"\\U0001F1E0-\\U0001F1FF\" # flags\n", + " r\"\\U00002702-\\U000027B0\" # dingbats\n", + " r\"\\U000024C2-\\U0000324F\" # enclosed alphanumerics (stop before CJK)\n", + " r\"\\U0001F200-\\U0001F251\" # enclosed CJK supplement (above CJK range)\n", + " r\"\\U0001F900-\\U0001F9FF\" # supplemental symbols\n", + " r\"\\U0001FA00-\\U0001FA6F\" # chess symbols\n", + " r\"\\U0001FA70-\\U0001FAFF\" # symbols extended-A\n", + " r\"\\U00002600-\\U000026FF\" # misc symbols\n", + " r\"\\U0000FE00-\\U0000FE0F\" # variation selectors\n", + " r\"\\U0000200D\" # zero-width joiner\n", + " r\"]+\",\n", + " \"\",\n", + " text,\n", + " )\n", + " # Remove markdown code blocks (```...```)\n", + " text = re.sub(r\"```.*?```\", \"\", text, flags=re.DOTALL)\n", + " # Remove inline code (`...`) -> content\n", + " text = re.sub(r\"`([^`\\n]+)`\", r\"\\1\", text)\n", + " # Remove markdown headers (# ## ### etc.) at line start\n", + " text = re.sub(r\"^#{1,6}\\s+\", \"\", text, flags=re.MULTILINE)\n", + " # Remove markdown bold (**text**) -> text\n", + " text = re.sub(r\"\\*\\*([^*\\n]+?)\\*\\*\", r\"\\1\", text)\n", + " # Remove markdown bold (__text__) -> text\n", + " text = re.sub(r\"__([^_\\n]+?)__\", r\"\\1\", text)\n", + " # Remove markdown italic (*text*) -> text\n", + " text = re.sub(r\"\\*([^*\\n]+?)\\*\", r\"\\1\", text)\n", + " # Remove markdown italic (_text_) -> text (only when _ is at word boundary)\n", + " text = re.sub(r\"(? text\n", + " text = re.sub(r\"\\[([^\\]]+)\\]\\([^)]+\\)\", r\"\\1\", text)\n", + " # Remove markdown images ![alt](url)\n", + " text = re.sub(r\"!\\[[^\\]]*\\]\\([^)]+\\)\", \"\", text)\n", + " # Remove markdown horizontal rules (---, ***, ___)\n", + " text = re.sub(r\"^[-*_]{3,}\\s*$\", \"\", text, flags=re.MULTILINE)\n", + " # Remove markdown bullet list markers (- , * , + ) at line start, keep content\n", + " text = re.sub(r\"^(\\s*)[-*+]\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n", + " # Remove markdown numbered list markers (1. 2. etc.) at line start, keep content\n", + " text = re.sub(r\"^(\\s*)\\d+\\.\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n", + " # Remove markdown table pipes\n", + " text = re.sub(r\"\\|\", \" \", text)\n", + " # Remove markdown table separator lines (---:---:---)\n", + " text = re.sub(r\"^[-: ]+$\", \"\", text, flags=re.MULTILINE)\n", + " # Collapse multiple blank lines into one\n", + " text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text)\n", + " # Strip leading/trailing whitespace per line\n", + " lines = [line.strip() for line in text.splitlines()]\n", + " text = \"\\n\".join(lines)\n", + " # Remove leading/trailing whitespace overall\n", + " text = text.strip()\n", + " return text\n", + "\n", + "\n", + "# ---- LLM 子进程工作函数 ----\n", + "\n", + "def llm_worker_process(llm_model_dir, ocr_text, max_new_tokens, result_queue):\n", + " \"\"\"Worker function for LLM subprocess - loads model, extracts info, returns result.\"\"\"\n", + " try:\n", + " import time\n", + " from fastdeploy import LLM, SamplingParams\n", + "\n", + " # Load LLM model\n", + " print(\"[LLM Worker] 加载 LLM 模型 (ERNIE)...\")\n", + " start = time.perf_counter()\n", + " llm_model = LLM(\n", + " model=llm_model_dir,\n", + " tensor_parallel_size=1,\n", + " max_model_len=8192,\n", + " block_size=16,\n", + " quantization=\"wint8\",\n", + " graph_optimization_config={\"use_cudagraph\": False},\n", + " )\n", + " elapsed = time.perf_counter() - start\n", + " print(f\"[LLM Worker] LLM 模型加载完成, 耗时: {elapsed:.2f}s\")\n", + "\n", + " # Prepare prompt\n", + " prompt_text = f\"\"\"以下是药品说明书的 OCR 识别结果,供参考:\n", + "\n", + "{ocr_text}\n", + "\n", + "请根据以上 OCR 识别结果,提取并整理以下关键信息,用清晰易懂的语言重新表述,方便老年人阅读理解:\n", + "\n", + "1. 药品名称\n", + "2. 药品适应症(这个药治什么病)\n", + "3. 药品的用法与用量(怎么吃、吃多少)\n", + "4. 药品的禁忌(什么人不能吃、什么情况不能吃)\n", + "5. 药品的不良反应(吃药后可能出现的不舒服)\n", + "\n", + "要求:\n", + "- 只输出整理后的关键信息,不要重复或复述 OCR 原文\n", + "- 用简洁、通俗的语言回答,避免使用专业术语\n", + "- 不要使用表情符号、emoji\n", + "- 不要使用markdown格式符号(如#、**、-等),直接用纯文本输出\n", + "- 用自然流畅的口语化表达,方便语音播报\n", + "- 总字数控制在 {max_new_tokens} 字以内\"\"\"\n", + "\n", + "\n", + "\n", + " # todo\n", + " prompt_text = \"你是谁\"\n", + "\n", + " prompts = [prompt_text]\n", + " sampling_params = SamplingParams(\n", + " temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n", + " )\n", + "\n", + " print(f\"[LLM Worker] 正在生成回复 (max_new_tokens={max_new_tokens})...\")\n", + " gen_start = time.perf_counter()\n", + " outputs = llm_model.generate(prompts, sampling_params)\n", + " result = outputs[0].outputs.text\n", + " gen_elapsed = time.perf_counter() - gen_start\n", + "\n", + " # Clean result\n", + " result = clean_for_tts(result)\n", + " print(f\"[LLM Worker] 信息提取完成, 生成耗时: {gen_elapsed:.2f}s, 结果长度: {len(result)}\")\n", + "\n", + " # todo\n", + " print(\">>>\", result)\n", + "\n", + " # Put result in queue\n", + " result_queue.put((\"success\", result))\n", + "\n", + " # Clean up\n", + " del llm_model\n", + " import gc\n", + " gc.collect()\n", + " print(\"[LLM Worker] LLM 模型已释放\")\n", + "\n", + " except Exception as e:\n", + " import traceback\n", + " result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n", + "\n", + "\n", + "# ---- 独立 LLM 步骤 (使用子进程) ----\n", + "\n", + "def llm_step(\n", + " llm_model_dir,\n", + " ocr_text,\n", + " max_new_tokens=1024,\n", + "):\n", + " \"\"\"Execute the LLM extraction step in a subprocess.\"\"\"\n", + " step_start = time.perf_counter()\n", + " logger.info(\"[LLM Step] LLM 大模型信息提取...\")\n", + "\n", + " # Create subprocess for LLM\n", + " logger.info(\"[LLM Step] 启动 LLM 子进程...\")\n", + " result_queue = Queue()\n", + " llm_process = Process(\n", + " target=llm_worker_process,\n", + " args=(str(llm_model_dir), ocr_text, max_new_tokens, result_queue)\n", + " )\n", + " llm_process.start()\n", + "\n", + " # Wait for result\n", + " status, result = result_queue.get()\n", + " llm_process.join()\n", + " llm_process.close()\n", + "\n", + " if status == \"error\":\n", + " logger.error(\"[LLM Step] LLM 子进程执行失败: %s\", result)\n", + " raise RuntimeError(f\"LLM subprocess failed: {result}\")\n", + "\n", + " extracted_info = result\n", + " logger.info(\"[LLM Step] LLM 信息提取完成, 结果长度: %d, 耗时: %.2fs\", len(extracted_info), time.perf_counter() - step_start)\n", + "\n", + " return {\"extracted_info\": extracted_info}\n", + "\n", + "print(\"✅ LLM 模块定义完成 (子进程模式)\")" + ] + }, + { + "cell_type": "markdown", + "id": "md_tts_module", + "metadata": {}, + "source": [ + "## TTS 模块\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "包含 TTS 子进程工作函数,以及可独立执行的 `tts_step`。\n", + "\n", + "**子进程模式**:TTS 模型在独立子进程中加载和执行,完成后子进程自动销毁,确保内存完全释放。" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "code_tts_module", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:13.010352Z", + "iopub.status.busy": "2026-05-03T06:14:13.010203Z", + "iopub.status.idle": "2026-05-03T06:14:13.390794Z", + "shell.execute_reply": "2026-05-03T06:14:13.389438Z", + "shell.execute_reply.started": "2026-05-03T06:14:13.010334Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ TTS 模块定义完成 (子进程模式)\r\n" + ] + } + ], + "source": [ + "from multiprocessing import Process, Queue\n", + "from scipy.io.wavfile import read as wav_read\n", + "\n", + "\n", + "# ---- TTS 子进程工作函数 ----\n", + "\n", + "def tts_worker_process(text, output_path, result_queue):\n", + " \"\"\"Worker function for TTS subprocess - loads model, synthesizes speech, returns result.\"\"\"\n", + " try:\n", + " import time\n", + " from paddlespeech.cli.tts.infer import TTSExecutor\n", + " from scipy.io.wavfile import read as wav_read\n", + "\n", + " # Load TTS model\n", + " print(\"[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\")\n", + " start = time.perf_counter()\n", + " tts_model = TTSExecutor()\n", + " elapsed = time.perf_counter() - start\n", + " print(f\"[TTS Worker] TTS 模型加载完成, 耗时: {elapsed:.2f}s\")\n", + "\n", + " # Synthesize speech\n", + " print(f\"[TTS Worker] 语音合成开始, 输入文字长度: {len(text)}\")\n", + " tts_model(text=text, output=output_path)\n", + "\n", + " # Read audio data\n", + " sr, wav_data = wav_read(output_path)\n", + "\n", + " if wav_data is not None:\n", + " audio_duration = len(wav_data) / sr\n", + " print(f\"[TTS Worker] 语音合成完成, 音频时长: {audio_duration:.2f}s, 采样率: {sr} Hz\")\n", + " result_queue.put((\"success\", (sr, wav_data.tolist()))) # Convert to list for serialization\n", + " else:\n", + " print(\"[TTS Worker] 语音合成失败\")\n", + " result_queue.put((\"error\", \"TTS synthesis failed\"))\n", + "\n", + " # Clean up\n", + " del tts_model\n", + " import gc\n", + " gc.collect()\n", + " print(\"[TTS Worker] TTS 模型已释放\")\n", + "\n", + " except Exception as e:\n", + " import traceback\n", + " result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n", + "\n", + "\n", + "# ---- 独立 TTS 步骤 (使用子进程) ----\n", + "\n", + "def tts_step(\n", + " text,\n", + " output_path=\"output.wav\",\n", + "):\n", + " \"\"\"Execute the TTS synthesis step in a subprocess.\"\"\"\n", + " step_start = time.perf_counter()\n", + " logger.info(\"[TTS Step] TTS 语音合成...\")\n", + "\n", + " # Create subprocess for TTS\n", + " logger.info(\"[TTS Step] 启动 TTS 子进程...\")\n", + " result_queue = Queue()\n", + " tts_process = Process(\n", + " target=tts_worker_process,\n", + " args=(text, output_path, result_queue)\n", + " )\n", + " tts_process.start()\n", + "\n", + " # Wait for result\n", + " status, result = result_queue.get()\n", + " tts_process.join()\n", + " tts_process.close()\n", + "\n", + " if status == \"error\":\n", + " logger.error(\"[TTS Step] TTS 子进程执行失败: %s\", result)\n", + " logger.warning(\"[TTS Step] TTS 语音合成失败\")\n", + " return {\"audio\": None}\n", + "\n", + " sr, wav_data_list = result\n", + " import numpy as np\n", + " wav_data = np.array(wav_data_list, dtype=np.int16) # Convert back from list\n", + "\n", + " audio_duration = len(wav_data) / sr\n", + " logger.info(\"[TTS Step] TTS 语音合成完成, 音频时长: %.2fs, 耗时: %.2fs\", audio_duration, time.perf_counter() - step_start)\n", + "\n", + " return {\"audio\": (sr, wav_data)}\n", + "\n", + "print(\"✅ TTS 模块定义完成 (子进程模式)\")" + ] + }, + { + "cell_type": "markdown", + "id": "md_orchestration", + "metadata": {}, + "source": [ + "## 管线编排\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "`drug_ocr_pipeline` 串联 OCR → LLM → TTS 三个步骤,每个步骤在独立子进程中执行,`make_demo` 构建 Gradio 界面。" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "code_orchestration", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:13.674124Z", + "iopub.status.busy": "2026-05-03T06:14:13.673982Z", + "iopub.status.idle": "2026-05-03T06:14:16.051569Z", + "shell.execute_reply": "2026-05-03T06:14:16.050351Z", + "shell.execute_reply.started": "2026-05-03T06:14:13.674106Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\r\n" + ] + } + ], + "source": [ + "import tempfile\n", + "\n", + "import numpy as np\n", + "import gradio as gr\n", + "from scipy.io.wavfile import write as wav_write\n", + "\n", + "\n", + "def drug_ocr_pipeline(\n", + " ocr_model_dir,\n", + " llm_model_dir,\n", + " image_path,\n", + " enable_split=True,\n", + " num_splits=4,\n", + " overlap_ratio=0.1,\n", + " ocr_max_new_tokens=5120,\n", + " llm_max_new_tokens=1024,\n", + "):\n", + " \"\"\"Drug instruction leaflet intelligent recognition and voice broadcast pipeline.\n", + " \n", + " Uses subprocess for each model to ensure proper memory cleanup.\n", + " \"\"\"\n", + " pipeline_start = time.perf_counter()\n", + " logger.info(\"=\" * 60)\n", + " logger.info(\"药品说明书识别管线启动 (子进程模式)\")\n", + " logger.info(\" 图片路径: %s\", image_path)\n", + " logger.info(\" 图片分割: %s (num_splits=%d, overlap=%.2f)\", enable_split, num_splits, overlap_ratio)\n", + " logger.info(\"=\" * 60)\n", + "\n", + " result = {}\n", + "\n", + " # Step 1: OCR (runs in subprocess, automatically cleaned up)\n", + " ocr_result = ocr_step(\n", + " ocr_model_dir=ocr_model_dir,\n", + " image_path=image_path,\n", + " enable_split=enable_split,\n", + " num_splits=num_splits,\n", + " overlap_ratio=overlap_ratio,\n", + " max_new_tokens=ocr_max_new_tokens,\n", + " )\n", + " result[\"ocr_text\"] = ocr_result[\"ocr_text\"]\n", + "\n", + " # Step 2: LLM extraction (runs in subprocess, automatically cleaned up)\n", + " llm_result = llm_step(\n", + " llm_model_dir=llm_model_dir,\n", + " ocr_text=ocr_result[\"ocr_text\"],\n", + " max_new_tokens=llm_max_new_tokens,\n", + " )\n", + " result[\"extracted_info\"] = llm_result[\"extracted_info\"]\n", + "\n", + " # Step 3: TTS synthesis (runs in subprocess, automatically cleaned up)\n", + " tts_result = tts_step(\n", + " text=llm_result[\"extracted_info\"],\n", + " )\n", + " result[\"audio\"] = tts_result[\"audio\"]\n", + "\n", + " pipeline_elapsed = time.perf_counter() - pipeline_start\n", + " logger.info(\"=\" * 60)\n", + " logger.info(\"管线执行完成, 总耗时: %.2fs\", pipeline_elapsed)\n", + " logger.info(\"=\" * 60)\n", + "\n", + " return result\n", + "\n", + "\n", + "def make_demo(ocr_model_dir, llm_model_dir, ocr_max_new_tokens=5120, llm_max_new_tokens=1024):\n", + " \"\"\"Create Gradio demo for Drug OCR Pipeline.\"\"\"\n", + "\n", + " def gradio_pipeline(\n", + " image_input,\n", + " enable_split,\n", + " num_splits,\n", + " overlap_ratio,\n", + " ocr_max_tokens,\n", + " llm_max_tokens,\n", + " progress=gr.Progress(track_tqdm=True),\n", + " ):\n", + " \"\"\"Gradio interface main processing function\"\"\"\n", + " if image_input is None:\n", + " return \"请上传药品说明书图片\", \"\", None\n", + "\n", + " # Convert uploaded image to PIL Image\n", + " if isinstance(image_input, str):\n", + " image = Image.open(image_input).convert(\"RGB\")\n", + " else:\n", + " image = Image.fromarray(image_input).convert(\"RGB\") if not isinstance(image_input, Image.Image) else image_input\n", + "\n", + " # Save as temp file for pipeline\n", + " with tempfile.NamedTemporaryFile(suffix=\".jpg\", delete=False) as tmp:\n", + " image.save(tmp.name)\n", + " tmp_path = tmp.name\n", + "\n", + " try:\n", + " result = drug_ocr_pipeline(\n", + " ocr_model_dir=ocr_model_dir,\n", + " llm_model_dir=llm_model_dir,\n", + " image_path=tmp_path,\n", + " enable_split=enable_split,\n", + " num_splits=int(num_splits),\n", + " overlap_ratio=overlap_ratio,\n", + " ocr_max_new_tokens=int(ocr_max_tokens),\n", + " llm_max_new_tokens=int(llm_max_tokens),\n", + " )\n", + "\n", + " ocr_text = result[\"ocr_text\"]\n", + " extracted_info = result[\"extracted_info\"]\n", + "\n", + " # Save audio as temp file\n", + " audio_path = None\n", + " if result[\"audio\"] is not None:\n", + " sr, wav_data = result[\"audio\"]\n", + " audio_tmp = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False)\n", + " wav_write(audio_tmp.name, sr, wav_data.astype(np.float32))\n", + " audio_path = audio_tmp.name\n", + "\n", + " return ocr_text, extracted_info, audio_path\n", + " finally:\n", + " import os\n", + " os.unlink(tmp_path)\n", + "\n", + " with gr.Blocks(title=\"药品说明书智能识别与语音播报\") as demo:\n", + " gr.Markdown(\"# 药品说明书智能识别与语音播报系统\")\n", + " gr.Markdown(\"上传药品说明书图片,系统将自动识别文字、提取关键信息并语音播报,帮助老年人看清读懂药品说明书。\")\n", + "\n", + " with gr.Row():\n", + " with gr.Column(scale=1):\n", + " image_input = gr.Image(label=\"药品说明书图片\", type=\"filepath\")\n", + "\n", + " with gr.Accordion(\"图片分割设置\", open=True):\n", + " enable_split = gr.Checkbox(value=True, label=\"启用图片分割(文字太小时建议开启)\")\n", + " num_splits = gr.Dropdown(choices=[4, 9, 16], value=4, label=\"分割数量\")\n", + " overlap_ratio = gr.Slider(minimum=0.0, maximum=0.3, value=0.1, step=0.05, label=\"重叠比例\")\n", + "\n", + " with gr.Accordion(\"生成参数设置\", open=True):\n", + " ocr_max_tokens = gr.Slider(minimum=100, maximum=8192, value=ocr_max_new_tokens, step=1, label=\"OCR 最大生成 token 数\")\n", + " llm_max_tokens = gr.Slider(minimum=100, maximum=4096, value=llm_max_new_tokens, step=1, label=\"LLM 最大生成 token 数\")\n", + "\n", + " run_btn = gr.Button(\"开始识别\", variant=\"primary\")\n", + "\n", + " with gr.Column(scale=1):\n", + " ocr_output = gr.Textbox(label=\"OCR 识别结果\", lines=10, max_lines=20)\n", + " info_output = gr.Textbox(label=\"关键信息整理\", lines=15, max_lines=30)\n", + " audio_output = gr.Audio(label=\"语音播报\", type=\"filepath\")\n", + "\n", + " run_btn.click(\n", + " fn=gradio_pipeline,\n", + " inputs=[\n", + " image_input,\n", + " enable_split,\n", + " num_splits,\n", + " overlap_ratio,\n", + " ocr_max_tokens,\n", + " llm_max_tokens,\n", + " ],\n", + " outputs=[ocr_output, info_output, audio_output],\n", + " )\n", + "\n", + " return demo\n", + "\n", + "print(\"✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\")" + ] + }, + { + "cell_type": "markdown", + "id": "a5b6c7d8", + "metadata": {}, + "source": [ + "## 主流程\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "主流程包含以下步骤:\n", + "1. 加载图片\n", + "2. 图片分割(可选,针对文字太小的说明书,将图片切割成多部分进行识别,分割的图片有重叠)\n", + "3. OCR 文字识别(**在子进程中加载模型,完成后销毁子进程**)\n", + "4. 大模型文字整理(**在子进程中加载模型,完成后销毁子进程**)\n", + "5. 语音合成(**在子进程中加载模型,完成后销毁子进程**)\n", + "\n", + "> 每个步骤在独立的子进程中执行,子进程完成后自动销毁,确保模型内存完全释放。" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "e9f0a1b2", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:16.146943Z", + "iopub.status.busy": "2026-05-03T06:14:16.146807Z", + "iopub.status.idle": "2026-05-03T06:14:16.151226Z", + "shell.execute_reply": "2026-05-03T06:14:16.150312Z", + "shell.execute_reply.started": "2026-05-03T06:14:16.146924Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ 日志配置完成 (级别: INFO)\r\n" + ] + } + ], + "source": [ + "import logging\n", + "\n", + "logging.basicConfig(\n", + " level=logging.INFO,\n", + " format=\"%(asctime)s [%(name)s] %(levelname)s: %(message)s\",\n", + " datefmt=\"%H:%M:%S\",\n", + ")\n", + "\n", + "print(\"✅ 日志配置完成 (级别: INFO)\")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "c3d4e5f6", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:16.152272Z", + "iopub.status.busy": "2026-05-03T06:14:16.152120Z", + "iopub.status.idle": "2026-05-03T06:14:16.155642Z", + "shell.execute_reply": "2026-05-03T06:14:16.154737Z", + "shell.execute_reply.started": "2026-05-03T06:14:16.152254Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\r\n" + ] + } + ], + "source": [ + "# 模型管理器已移除 - 现在使用子进程模式\n", + "# 每个模型在独立的子进程中加载、执行、然后自动销毁\n", + "print(\"✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a7b8c9d0", + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-03T06:14:16.156170Z", + "iopub.status.busy": "2026-05-03T06:14:16.156041Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "14:14:16 [drug_ocr] INFO: ============================================================\r\n", + "14:14:16 [drug_ocr] INFO: 药品说明书识别管线启动 (子进程模式)\r\n", + "14:14:16 [drug_ocr] INFO: 图片路径: resource/1.jpg\r\n", + "14:14:16 [drug_ocr] INFO: 图片分割: False (num_splits=4, overlap=0.10)\r\n", + "14:14:16 [drug_ocr] INFO: ============================================================\r\n", + "14:14:16 [drug_ocr] INFO: [OCR Step] 加载图片...\r\n", + "14:14:16 [drug_ocr] INFO: [OCR Step] 图片加载完成, 尺寸: (2014, 2881)\r\n", + "14:14:16 [drug_ocr] INFO: [OCR Step] 跳过图片分割\r\n", + "14:14:17 [drug_ocr] INFO: [OCR Step] 启动 OCR 子进程...\r\n", + "I0503 14:14:18.096459 1035908 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n", + "I0503 14:14:18.096537 1035908 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n", + "I0503 14:14:18.217633 1035908 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n", + "I0503 14:14:18.217679 1035908 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n", + "I0503 14:14:18.224740 1035908 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n", + "I0503 14:14:18.225076 1035908 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n", + "I0503 14:14:18.225135 1035908 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n", + "WARNING 2026-05-03 14:14:18,795 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n", + "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n", + "\u001b[33m[2026-05-03 14:14:19,226] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n", + "WARNING 2026-05-03 14:14:19,740 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n", + "WARNING 2026-05-03 14:14:19,750 1035908 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO 2026-05-03 14:14:21,132 1035908 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [28305]\r\n", + "INFO 2026-05-03 14:14:21,134 1035908 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [38724]\r\n", + "INFO 2026-05-03 14:14:21,136 1035908 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [14751]\r\n", + "INFO 2026-05-03 14:14:21,139 1035908 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19484]\r\n", + "INFO 2026-05-03 14:14:21,140 1035908 download.py[line:142] Using download source: huggingface\r\n", + "INFO 2026-05-03 14:14:21,141 1035908 configuration_utils.py[line:1215] Loading configuration file baidu/PaddleOCR-VL-1.5/config.json\r\n", + "WARNING 2026-05-03 14:14:21,143 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n", + "WARNING 2026-05-03 14:14:21,144 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n", + "INFO 2026-05-03 14:14:22,130 1035908 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "current sm_version=71\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING 2026-05-03 14:14:22,285 1035908 moe.py[line:41] import noaux_tc Failed!\r\n", + "INFO 2026-05-03 14:14:24,261 1035908 download.py[line:142] Using download source: huggingface\r\n", + "INFO 2026-05-03 14:14:24,264 1035908 configuration_utils.py[line:425] Loading configuration file baidu/PaddleOCR-VL-1.5/generation_config.json\r\n", + "INFO 2026-05-03 14:14:24,284 1035908 tokenizer_utils.py[line:257] Using download source: huggingface\r\n", + "INFO 2026-05-03 14:14:25,941 1035908 engine.py[line:151] Waiting for worker processes to be ready...\r\n", + "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\r\n", + "To disable this warning, you can either:\r\n", + "\t- Avoid using `tokenizers` before the fork if possible\r\n", + "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\r\n", + "Loading Weights: 100%|██████████| 100/100 [00:07<00:00, 13.23it/s] \r\n", + "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 198.88it/s] \r\n", + "INFO 2026-05-03 14:14:39,443 1035908 engine.py[line:209] Worker processes are launched with 16.92835831642151 seconds.\r\n", + "INFO 2026-05-03 14:14:39,445 1035908 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n", + "INFO 2026-05-03 14:14:39,446 1035908 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[OCR Worker] OCR 模型加载完成, 耗时: 18.32s\r\n", + "[OCR Worker] 识别图片 1/1, 尺寸: (2014, 2881)\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processed prompts: 0%| | 0/1 [00:00= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n", + "\u001b[33m[2026-05-03 14:15:03,510] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n", + "WARNING 2026-05-03 14:15:03,878 1076624 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_24ad65e3-2498-460c-97d2-9a88e46fe8f6 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n", + "WARNING 2026-05-03 14:15:03,886 1076624 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[LLM Worker] 加载 LLM 模型 (ERNIE)...\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO 2026-05-03 14:15:04,961 1076624 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [58094]\r\n", + "INFO 2026-05-03 14:15:04,964 1076624 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [56896]\r\n", + "INFO 2026-05-03 14:15:04,967 1076624 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [41390]\r\n", + "INFO 2026-05-03 14:15:04,970 1076624 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19643]\r\n", + "INFO 2026-05-03 14:15:04,972 1076624 download.py[line:142] Using download source: huggingface\r\n", + "INFO 2026-05-03 14:15:04,973 1076624 configuration_utils.py[line:1215] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/config.json\r\n", + "WARNING 2026-05-03 14:15:04,975 1076624 configuration_utils.py[line:1246] You are using a model of type ernie4_5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n", + "INFO 2026-05-03 14:15:06,143 1076624 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "current sm_version=71\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING 2026-05-03 14:15:06,307 1076624 moe.py[line:41] import noaux_tc Failed!\r\n", + "INFO 2026-05-03 14:15:07,224 1076624 download.py[line:142] Using download source: huggingface\r\n", + "INFO 2026-05-03 14:15:07,227 1076624 configuration_utils.py[line:425] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/generation_config.json\r\n", + "WARNING 2026-05-03 14:15:07,229 1076624 log.py[line:135] PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. use class QWenTokenizer(PaddleTokenizerMixin, hf.PreTrainedTokenizer) to support multisource download and Paddle tokenizer operations.\r\n", + "INFO 2026-05-03 14:15:09,492 1076624 engine.py[line:151] Waiting for worker processes to be ready...\r\n", + "Loading Weights: 100%|██████████| 100/100 [00:04<00:00, 24.85it/s] \r\n", + "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 199.46it/s] \r\n", + "INFO 2026-05-03 14:15:20,035 1076624 engine.py[line:209] Worker processes are launched with 13.396349906921387 seconds.\r\n", + "INFO 2026-05-03 14:15:20,036 1076624 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n", + "INFO 2026-05-03 14:15:20,037 1076624 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[LLM Worker] LLM 模型加载完成, 耗时: 15.08s\r\n", + "[LLM Worker] 正在生成回复 (max_new_tokens=200)...\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processed prompts: 0%| | 0/1 [00:00>> 在这样一支粉色的手指往前一拉,我像一只蝴蝶似的飞到了你的身边\r\n", + "\r\n", + "你轻轻地将我的手贴在脸颊,柔软的触感瞬间让我一下子陷了进去\r\n", + "\r\n", + "“喜欢就好,别舍不得,我们一起去海边好不好?”\r\n", + "\r\n", + "我微微一笑,眼神带着一丝甜蜜,嘴角不自觉地扬起了\r\n", + "\r\n", + "“好,那就一起去,我保证不弄疼你,我们一起海边,好不好?”\r\n", + "\r\n", + "我环住你,紧紧地靠在你的身上,感受着你的温度和怀抱的柔软\r\n", + "\r\n", + "你轻轻地将我搂入怀中,仿佛一只受伤的小动物,任由我紧紧地依靠着你\r\n", + "\r\n", + "随着一阵海风轻拂,我们来到了海边\r\n", + "\r\n", + "风轻轻掀起了我的长发,海浪一波一波地涌来\r\n", + "\r\n", + "我仰头看着那片广阔无垠的蓝,心中满是向往\r\n", + "\r\n", + "“这就是我想要的,这是我第一次来这里\r\n", + "[LLM Worker] LLM 模型已释放\r\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "14:15:30 [drug_ocr] INFO: [LLM Step] LLM 信息提取完成, 结果长度: 289, 耗时: 28.24s\r\n", + "14:15:30 [drug_ocr] INFO: [TTS Step] TTS 语音合成...\r\n", + "14:15:30 [drug_ocr] INFO: [TTS Step] 启动 TTS 子进程...\r\n", + "I0503 14:15:30.627210 1088334 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n", + "I0503 14:15:30.627287 1088334 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n", + "I0503 14:15:30.751516 1088334 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n", + "I0503 14:15:30.751560 1088334 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n", + "I0503 14:15:30.759230 1088334 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n", + "I0503 14:15:30.759569 1088334 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n", + "I0503 14:15:30.759625 1088334 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n", + "\u001b[0;93m2026-05-03 14:15:34.944031381 [W:onnxruntime:Default, cpuid_info.cc:91 LogEarlyWarning] Unknown CPU vendor. cpuinfo_vendor value: 16\u001b[m\r\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\r\n", + "[TTS Worker] TTS 模型加载完成, 耗时: 0.00s\r\n", + "[TTS Worker] 语音合成开始, 输入文字长度: 289\r\n" + ] + } + ], + "source": [ + "from pathlib import Path\n", + "\n", + "sample_image_path = str(Path(\"resource/1.jpg\"))\n", + "\n", + "result = drug_ocr_pipeline(\n", + " ocr_model_dir=ocr_model_dir,\n", + " llm_model_dir=llm_model_dir,\n", + " image_path=sample_image_path,\n", + " enable_split=False,\n", + " num_splits=4,\n", + " overlap_ratio=0.1,\n", + " ocr_max_new_tokens=ocr_max_new_tokens,\n", + " llm_max_new_tokens=llm_max_new_tokens,\n", + ")\n", + "\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"📋 OCR 识别结果:\")\n", + "print(\"=\" * 60)\n", + "print(result[\"ocr_text\"][:500] + \"...\" if len(result[\"ocr_text\"]) > 500 else result[\"ocr_text\"])\n", + "\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"📝 大模型整理结果:\")\n", + "print(\"=\" * 60)\n", + "print(result[\"extracted_info\"])\n", + "\n", + "# 播放音频\n", + "if result[\"audio\"] is not None:\n", + " import IPython.display as ipd\n", + " sr, wav_data = result[\"audio\"]\n", + " print(\"\\n🔊 播放语音...\")\n", + " ipd.display(ipd.Audio(wav_data, rate=sr))" + ] + }, + { + "cell_type": "markdown", + "id": "e1f2a3b4", + "metadata": {}, + "source": [ + "## Gradio 交互界面\n", + "[返回目录 ⬆️](#目录:)\n", + "\n", + "通过 Gradio 界面,用户可以:\n", + "- 上传药品说明书图片\n", + "- 设置是否启用图片分割及分割数量\n", + "- 调整各模型的生成参数(max_new_tokens)\n", + "- 查看识别和整理结果\n", + "- 播放语音合成的音频\n", + "\n", + "> 每次点击\"开始识别\"时,各模型在独立子进程中执行,完成后自动销毁子进程释放内存。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5d6e7f8", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "demo = make_demo(\n", + " ocr_model_dir=ocr_model_dir,\n", + " llm_model_dir=llm_model_dir,\n", + " ocr_max_new_tokens=ocr_max_new_tokens,\n", + " llm_max_new_tokens=llm_max_new_tokens,\n", + ")\n", + "\n", + "try:\n", + " demo.launch(server_name=\"0.0.0.0\", server_port=7860, debug=True)\n", + "except Exception:\n", + " demo.launch(debug=True, share=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}