diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md
new file mode 100644
index 00000000..7e8bd25d
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/[WeeklyReport]2026.04.27~2026.05.08.md
@@ -0,0 +1,82 @@
+### 认领者 GitHub ID
+megemini
+
+### 赛题信息
+
+- **进阶任务序号**：#15
+- **赛题名称**：基于天数智芯硬件与文心多模态模型的创新应用
+- **关联厂商**：天数
+
+### 本周工作
+
+1. **RFC 文档**
+
+   - 已经完成 RFC 文档
+   - AI Studio 地址：https://aistudio.baidu.com/project/edit/10221576
+
+2. **代码实现**
+
+   - 已经完成 AI Studio 项目的 notebook
+   - 已经创建了双卡的天数环境
+
+3. **README**
+
+    - 可以参考 AI Studio 项目的 notebook
+
+4. **演示视频/截图**
+
+    - 待完成
+
+5. **问题与解决**
+
+   - 问题：AI Studio 的 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle
+
+    现在有一个很奇怪的问题，AI Studio 的 notebook 中无法 `正常` 调用 ERNIE-4.5-0.3B-Paddle 模型。模型可以正常的运行，但是，输出是 `答非所问` 。
+
+    请看下面的截图，我将 PaddleOCR-VL-1.5 识别的结果手动放入到 prompt 中：
+
+    ![images/cli_prompt.png](images/cli_prompt.png)
+
+    使用命令行调用模型，输出是正常的：
+
+    ![images/cli_ok.png](images/cli_ok.png)
+
+    但是，如果放到 notebook 中，输出就是一长串的空白（空格和回车）！
+
+    我手动将 notebook 中的 prompt 修改为 `你是谁` 测试模型的输出：
+
+    ![images/notebook_prompt.png](images/notebook_input.png)
+
+    输出是一段奇怪的东西：
+
+    ![images/notebook_output.png](images/notebook_output.png)
+
+    有时候还会给我输出一段完形填空题。
+
+    我尝试在 notebook 中进行函数调用，也尝试使用子进行调用，都不行！
+
+    现在附上 notebook 文件 `medical_pipeline_20260503.ipynbS`，可以直接执行。
+
+    另外，还发现个问题，在 AI Studio 中，显存有时无法释放，可以看到截图中，即便什么都没有，现在也被占用了 45% 的显存。我不确定是 AI Studio 的问题，还是 Fastdeploy 配合天数硬件的问题。 请帮忙看一下。
+
+    - 问题：天数的双卡的框架开发环境，只有命令行模式，不能使用 notebook，也不能进行项目公开
+
+    现在的解决方案是，先在单卡环境中调通 notebook，然后再双卡环境中验证 pipeline 是否能够走通。
+
+### 下周计划
+
+1. 调试 notebook
+2. 调试双卡环境
+
+### 当前阻塞（无则填"无"）
+
+- 解决 notebook 中无法正常调用 ERNIE-4.5-0.3B-Paddle 模型的问题
+
+### 交付物进展
+
+| 交付物 | 状态 | 备注 |
+|--------|:----:|------|
+| RFC 文档 | ✅ 已完成 | - |
+| 代码实现 | 🔄  | |
+| README | 🔄  | - |
+| 演示视频/截图 |🔄  | - |
\ No newline at end of file
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png
new file mode 100644
index 00000000..d1b3e20f
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_ok.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png
new file mode 100644
index 00000000..d2efd019
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/cli_prompt.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png
new file mode 100644
index 00000000..84853809
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_input.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png
new file mode 100644
index 00000000..d2f6e2af
Binary files /dev/null and b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/images/notebook_output.png differ
diff --git a/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb
new file mode 100644
index 00000000..ec34bd1e
--- /dev/null
+++ b/WeeklyReports/Hackathon_10th/ERNIEPartner/15_megemini/medical_pipeline_20260503.ipynb
@@ -0,0 +1,1512 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a1b2c3d4",
+   "metadata": {},
+   "source": [
+    "# 药品说明书智能识别与语音播报系统\n",
+    "\n",
+    "## 项目说明\n",
+    "\n",
+    "针对药品说明书字体太小、老年人看不清读不懂的问题，本项目通过以下三个步骤，将药品说明书中的重点内容识别提取并语音播报：\n",
+    "\n",
+    "1. **OCR 识别**：使用 PaddleOCR-VL-1.5 模型对药品说明书图片进行文字识别\n",
+    "2. **大模型整理**：使用 ERNIE-4.5 大模型对识别的文字进行整理，提取关键信息\n",
+    "3. **语音合成播报**：使用 PaddleSpeech 语音合成模型将整理后的文字转为音频文件\n",
+    "\n",
+    "### 提取的关键信息包括：\n",
+    "1. 药品名称\n",
+    "2. 药品适应症\n",
+    "3. 药品的用法与用量\n",
+    "4. 药品的禁忌\n",
+    "5. 药品的不良反应\n",
+    "\n",
+    "### 技术栈：\n",
+    "- OCR: PaddleOCR-VL-1.5\n",
+    "- LLM: ERNIE-4.5-0.3B-Paddle\n",
+    "- TTS: PaddleSpeech bert-base-chinese\n",
+    "\n",
+    "### 内存优化（子进程模式）：\n",
+    "为确保内存完全释放，本系统采用**子进程模式**运行每个模型：\n",
+    "- 每个模型在独立的子进程中加载和执行\n",
+    "- 子进程完成后自动销毁，确保内存完全释放\n",
+    "- 主进程仅负责数据传递和流程控制，不加载模型\n",
+    "- 例如：OCR 在子进程运行，完成后子进程销毁，再启动 LLM 子进程\n",
+    "\n",
+    "#### 目录：\n",
+    "- [模型下载与检查](#模型下载与检查)\n",
+    "- [生成参数设置](#生成参数设置)\n",
+    "- [OCR 模块](#OCR-模块)\n",
+    "- [LLM 模块](#LLM-模块)\n",
+    "- [TTS 模块](#TTS-模块)\n",
+    "- [管线编排与模型管理](#管线编排与模型管理)\n",
+    "- [主流程](#主流程)\n",
+    "- [Gradio 交互界面](#Gradio-交互界面)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "088dfe7b-8df9-47d3-b94d-70db4eb1a2a9",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:08:00.811267Z",
+     "iopub.status.busy": "2026-05-03T06:08:00.811134Z"
+    }
+   },
+   "source": [
+    "%pip install -r requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "bdb8d7d5",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:11:06.332147Z",
+     "iopub.status.busy": "2026-05-03T06:11:06.332023Z",
+     "iopub.status.idle": "2026-05-03T06:11:11.425006Z",
+     "shell.execute_reply": "2026-05-03T06:11:11.423507Z",
+     "shell.execute_reply.started": "2026-05-03T06:11:06.332128Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found existing installation: opencc-python-reimplemented 0.1.6\r\n",
+      "Uninstalling opencc-python-reimplemented-0.1.6:\r\n",
+      "  Successfully uninstalled opencc-python-reimplemented-0.1.6\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n",
+      "Collecting opencc-python-reimplemented==0.1.6\r\n",
+      "  Using cached opencc_python_reimplemented-0.1.6-py2.py3-none-any.whl\r\n",
+      "Installing collected packages: opencc-python-reimplemented\r\n",
+      "Successfully installed opencc-python-reimplemented-0.1.6\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Found existing installation: aistudio-sdk 0.3.8\r\n",
+      "Uninstalling aistudio-sdk-0.3.8:\r\n",
+      "  Successfully uninstalled aistudio-sdk-0.3.8\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n",
+      "Looking in indexes: http://mirrors.baidubce.com/pypi/simple/\r\n",
+      "Collecting aistudio-sdk==0.3.8\r\n",
+      "  Using cached http://mirrors.baidubce.com/pypi/packages/cb/77/cd71a481bb7a76b0e9d0b6bf47711c627b1dd079001ea246893f19a9d04c/aistudio_sdk-0.3.8-py3-none-any.whl (62 kB)\r\n",
+      "Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (7.2.1)\r\n",
+      "Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (2.32.5)\r\n",
+      "Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (4.67.1)\r\n",
+      "Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (0.9.59)\r\n",
+      "Requirement already satisfied: prettytable in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (3.17.0)\r\n",
+      "Requirement already satisfied: click in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk==0.3.8) (8.3.1)\r\n",
+      "Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (3.23.0)\r\n",
+      "Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.0.0)\r\n",
+      "Requirement already satisfied: six>=1.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->aistudio-sdk==0.3.8) (1.17.0)\r\n",
+      "Requirement already satisfied: wcwidth in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from prettytable->aistudio-sdk==0.3.8) (0.2.14)\r\n",
+      "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.4.4)\r\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (3.11)\r\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in ./external-libraries/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (1.26.20)\r\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk==0.3.8) (2026.1.4)\r\n",
+      "Installing collected packages: aistudio-sdk\r\n",
+      "\u001b[33m  WARNING: The script aistudio is installed in '/home/aistudio/external-libraries/bin' which is not on PATH.\r\n",
+      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\r\n",
+      "\u001b[0mSuccessfully installed aistudio-sdk-0.3.8\r\n",
+      "Note: you may need to restart the kernel to use updated packages.\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip uninstall opencc-python-reimplemented -y\n",
+    "%pip install opencc-python-reimplemented==0.1.6\n",
+    "%pip uninstall aistudio-sdk -y\n",
+    "%pip install aistudio-sdk==0.3.8\n",
+    "# PaddleSpeech use 0.2.6 with should be patched"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "71b16cd1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:11:11.426579Z",
+     "iopub.status.busy": "2026-05-03T06:11:11.426259Z",
+     "iopub.status.idle": "2026-05-03T06:11:11.436812Z",
+     "shell.execute_reply": "2026-05-03T06:11:11.435719Z",
+     "shell.execute_reply.started": "2026-05-03T06:11:11.426550Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File already patched.\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "\"\"\"Patch script to fix aistudio_sdk import in paddlenlp.\n",
+    "\n",
+    "Uses importlib.util.find_spec to locate paddlenlp WITHOUT importing it,\n",
+    "so this can be run before paddlenlp is imported to prevent the ImportError.\n",
+    "\"\"\"\n",
+    "\n",
+    "import importlib.util\n",
+    "import os\n",
+    "import subprocess\n",
+    "\n",
+    "\n",
+    "def _find_paddlenlp_dir():\n",
+    "    # Method 1: find_spec (no import, just metadata)\n",
+    "    spec = importlib.util.find_spec(\"paddlenlp\")\n",
+    "    if spec and spec.origin:\n",
+    "        return os.path.dirname(spec.origin)\n",
+    "\n",
+    "    # Method 2: pip show as fallback\n",
+    "    result = subprocess.run(\n",
+    "        [\"pip\", \"show\", \"paddlenlp\"],\n",
+    "        capture_output=True, text=True,\n",
+    "    )\n",
+    "    for line in result.stdout.splitlines():\n",
+    "        if line.startswith(\"Location:\"):\n",
+    "            return os.path.join(line.split(\":\", 1)[1].strip(), \"paddlenlp\")\n",
+    "\n",
+    "    raise RuntimeError(\"Cannot locate paddlenlp installation directory\")\n",
+    "\n",
+    "\n",
+    "def patch_aistudio_utils():\n",
+    "    pkg_dir = _find_paddlenlp_dir()\n",
+    "    target_file = os.path.join(pkg_dir, \"transformers\", \"aistudio_utils.py\")\n",
+    "\n",
+    "    if not os.path.isfile(target_file):\n",
+    "        raise FileNotFoundError(f\"Target file not found: {target_file}\")\n",
+    "\n",
+    "    old_line = \"from aistudio_sdk.hub import download\"\n",
+    "    new_line = \"from aistudio_sdk import snapshot_download as download\"\n",
+    "\n",
+    "    with open(target_file, \"r\", encoding=\"utf-8\") as f:\n",
+    "        content = f.read()\n",
+    "\n",
+    "    if old_line not in content:\n",
+    "        if new_line in content:\n",
+    "            print(\"File already patched.\")\n",
+    "        else:\n",
+    "            print(f\"Target import not found in {target_file}\")\n",
+    "        return\n",
+    "\n",
+    "    patched = content.replace(old_line, new_line)\n",
+    "\n",
+    "    with open(target_file, \"w\", encoding=\"utf-8\") as f:\n",
+    "        f.write(patched)\n",
+    "\n",
+    "    print(f\"Patched: {target_file}\")\n",
+    "    print(f\"  {old_line}  =>  {new_line}\")\n",
+    "\n",
+    "\n",
+    "patch_aistudio_utils()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9d0e1f2",
+   "metadata": {},
+   "source": [
+    "## 模型下载与检查\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "从 AIStudio 下载三个模型（如果已存在则跳过），并检查模型文件是否完整。\n",
+    "\n",
+    "> **注意**：此步骤仅下载和检查模型，**不加载模型到内存**。模型将在管线运行时按需加载。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a3b4c5d6",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.765664Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.765530Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.771812Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.770795Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.765642Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OCR 模型已存在: baidu/PaddleOCR-VL-1.5，跳过下载\r\n",
+      "LLM 模型已存在: baidu/ERNIE-4.5-0.3B-Paddle，跳过下载\r\n",
+      "TTS 模型将在首次使用时自动下载\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pathlib import Path\n",
+    "import subprocess\n",
+    "\n",
+    "# --- OCR 模型 ---\n",
+    "ocr_model_dir = Path(\"baidu/PaddleOCR-VL-1.5\")\n",
+    "\n",
+    "if not ocr_model_dir.exists():\n",
+    "    subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/PaddleOCR-VL-1.5\", \"--local_dir\", str(ocr_model_dir)], check=True)\n",
+    "    print(f\"OCR 模型已下载到: {ocr_model_dir}\")\n",
+    "else:\n",
+    "    print(f\"OCR 模型已存在: {ocr_model_dir}，跳过下载\")\n",
+    "\n",
+    "# --- LLM 模型 ---\n",
+    "llm_model_dir = Path(\"baidu/ERNIE-4.5-0.3B-Paddle\")\n",
+    "\n",
+    "if not llm_model_dir.exists():\n",
+    "    subprocess.run([\"aistudio\", \"download\", \"--model\", \"PaddlePaddle/ERNIE-4.5-0.3B-Paddle\", \"--local_dir\", str(llm_model_dir)], check=True)\n",
+    "    print(f\"LLM 模型已下载到: {llm_model_dir}\")\n",
+    "else:\n",
+    "    print(f\"LLM 模型已存在: {llm_model_dir}，跳过下载\")\n",
+    "\n",
+    "# --- TTS 模型 ---\n",
+    "# PaddleSpeech bert-base-chinese 会在首次使用时自动下载\n",
+    "print(\"TTS 模型将在首次使用时自动下载\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7f8a9b0",
+   "metadata": {},
+   "source": [
+    "## 生成参数设置\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "设置模型的 `max_new_tokens` 参数，控制每个模型生成的最大 token 数量：\n",
+    "- **OCR max_new_tokens**：PaddleOCR-VL 识别文字时的最大生成长度，说明书内容多时建议调大\n",
+    "- **LLM max_new_tokens**：ERNIE 提取信息时的最大生成长度，需要更详细整理时可调大"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c1d2e3f4",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.773035Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.772880Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.777084Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.775954Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.773015Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OCR max_new_tokens: 200\r\n",
+      "LLM max_new_tokens: 200\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# OCR 最大生成 token 数（说明书内容多时建议调大，默认 5120）\n",
+    "ocr_max_new_tokens = 200\n",
+    "\n",
+    "# LLM 最大生成 token 数（需要更详细整理时可调大，默认 1024）\n",
+    "llm_max_new_tokens = 200\n",
+    "\n",
+    "print(f\"OCR max_new_tokens: {ocr_max_new_tokens}\")\n",
+    "print(f\"LLM max_new_tokens: {llm_max_new_tokens}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_ocr_module",
+   "metadata": {},
+   "source": [
+    "## OCR 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含图片分割、OCR 子进程工作函数，以及可独立执行的 `ocr_step`。\n",
+    "\n",
+    "**子进程模式**：OCR 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "code_ocr_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.777863Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.777725Z",
+     "iopub.status.idle": "2026-05-03T06:14:12.993437Z",
+     "shell.execute_reply": "2026-05-03T06:14:12.992174Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.777846Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ OCR 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import base64\n",
+    "import gc\n",
+    "import io\n",
+    "import logging\n",
+    "import math\n",
+    "import time\n",
+    "import multiprocessing as mp\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "from PIL import Image\n",
+    "\n",
+    "logger = logging.getLogger(\"drug_ocr\")\n",
+    "\n",
+    "\n",
+    "# ---- 图片分割 ----\n",
+    "\n",
+    "def split_image(image, num_splits=4, overlap_ratio=0.1):\n",
+    "    \"\"\"Split an image into num_splits parts (NxN grid) with overlap.\"\"\"\n",
+    "    grid_size = int(math.sqrt(num_splits))\n",
+    "    if grid_size * grid_size != num_splits:\n",
+    "        raise ValueError(f\"num_splits must be a perfect square (e.g. 4, 9, 16), got: {num_splits}\")\n",
+    "\n",
+    "    w, h = image.size\n",
+    "    cell_w = w / grid_size\n",
+    "    cell_h = h / grid_size\n",
+    "    overlap_w = cell_w * overlap_ratio\n",
+    "    overlap_h = cell_h * overlap_ratio\n",
+    "\n",
+    "    sub_images = []\n",
+    "    for row in range(grid_size):\n",
+    "        for col in range(grid_size):\n",
+    "            left = max(0, col * cell_w - overlap_w)\n",
+    "            upper = max(0, row * cell_h - overlap_h)\n",
+    "            right = min(w, (col + 1) * cell_w + overlap_w)\n",
+    "            lower = min(h, (row + 1) * cell_h + overlap_h)\n",
+    "            sub_img = image.crop((int(left), int(upper), int(right), int(lower)))\n",
+    "            sub_images.append(sub_img)\n",
+    "\n",
+    "    return sub_images\n",
+    "\n",
+    "\n",
+    "# ---- OCR 子进程工作函数 ----\n",
+    "\n",
+    "def ocr_worker_process(ocr_model_dir, image_data_list, max_new_tokens, result_queue):\n",
+    "    \"\"\"Worker function for OCR subprocess - loads model, performs OCR, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        import base64\n",
+    "        import io\n",
+    "        from PIL import Image\n",
+    "        from fastdeploy import LLM, SamplingParams\n",
+    "\n",
+    "        # Load OCR model\n",
+    "        print(\"[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        ocr_model = LLM(\n",
+    "            model=ocr_model_dir,\n",
+    "            tensor_parallel_size=1,\n",
+    "            max_model_len=8192,\n",
+    "            block_size=16,\n",
+    "            quantization=\"wint8\",\n",
+    "            graph_optimization_config={\"use_cudagraph\": False},\n",
+    "        )\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[OCR Worker] OCR 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Process each image\n",
+    "        all_ocr_texts = []\n",
+    "        for i, img_bytes in enumerate(image_data_list):\n",
+    "            image = Image.open(io.BytesIO(img_bytes)).convert(\"RGB\")\n",
+    "            print(f\"[OCR Worker] 识别图片 {i+1}/{len(image_data_list)}, 尺寸: {image.size}\")\n",
+    "\n",
+    "            # Prepare image for OCR\n",
+    "            buf = io.BytesIO()\n",
+    "            image.save(buf, format=\"PNG\")\n",
+    "            base64_image = base64.b64encode(buf.getvalue()).decode(\"utf-8\")\n",
+    "            image_url = f\"data:image/png;base64,{base64_image}\"\n",
+    "\n",
+    "            prompts = [{\n",
+    "                \"messages\": [{\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": [\n",
+    "                        {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
+    "                        {\"type\": \"text\", \"text\": \"OCR:\"},\n",
+    "                    ],\n",
+    "                }]\n",
+    "            }]\n",
+    "            sampling_params = SamplingParams(\n",
+    "                temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n",
+    "            )\n",
+    "            outputs = ocr_model.generate(prompts, sampling_params)\n",
+    "            response = outputs[0].outputs.text\n",
+    "            all_ocr_texts.append(response)\n",
+    "            print(f\"[OCR Worker] 图片 {i+1} 识别完成, 文字长度: {len(response)}\")\n",
+    "\n",
+    "        # Combine results\n",
+    "        combined_text = \"\\n\\n\".join(all_ocr_texts)\n",
+    "        print(f\"[OCR Worker] 全部识别完成, 总文字长度: {len(combined_text)}\")\n",
+    "\n",
+    "        # Put result in queue\n",
+    "        result_queue.put((\"success\", combined_text))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del ocr_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[OCR Worker] OCR 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 OCR 步骤 (使用子进程) ----\n",
+    "\n",
+    "def ocr_step(\n",
+    "    ocr_model_dir,\n",
+    "    image_path,\n",
+    "    enable_split=True,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    max_new_tokens=5120,\n",
+    "):\n",
+    "    \"\"\"Execute the OCR step in a subprocess: load image, optionally split, and run OCR.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[OCR Step] 加载图片...\")\n",
+    "    image = Image.open(image_path).convert(\"RGB\")\n",
+    "    logger.info(\"[OCR Step] 图片加载完成, 尺寸: %s\", image.size)\n",
+    "\n",
+    "    if enable_split:\n",
+    "        logger.info(\"[OCR Step] 图片分割 (num_splits=%d, overlap=%.2f)...\", num_splits, overlap_ratio)\n",
+    "        sub_images = split_image(image, num_splits=num_splits, overlap_ratio=overlap_ratio)\n",
+    "        ocr_images = [image] + sub_images\n",
+    "        logger.info(\"[OCR Step] 图片分割完成, 原始1张 + 分割%d张 = 共%d张\", len(sub_images), len(ocr_images))\n",
+    "    else:\n",
+    "        logger.info(\"[OCR Step] 跳过图片分割\")\n",
+    "        ocr_images = [image]\n",
+    "\n",
+    "    # Serialize images to bytes for subprocess\n",
+    "    image_data_list = []\n",
+    "    for img in ocr_images:\n",
+    "        buf = io.BytesIO()\n",
+    "        img.save(buf, format=\"PNG\")\n",
+    "        image_data_list.append(buf.getvalue())\n",
+    "\n",
+    "    # Create subprocess for OCR\n",
+    "    logger.info(\"[OCR Step] 启动 OCR 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    ocr_process = Process(\n",
+    "        target=ocr_worker_process,\n",
+    "        args=(str(ocr_model_dir), image_data_list, max_new_tokens, result_queue)\n",
+    "    )\n",
+    "    ocr_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    ocr_process.join()\n",
+    "    ocr_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[OCR Step] OCR 子进程执行失败: %s\", result)\n",
+    "        raise RuntimeError(f\"OCR subprocess failed: {result}\")\n",
+    "\n",
+    "    combined_ocr_text = result\n",
+    "    logger.info(\"[OCR Step] OCR 识别全部完成, 总文字长度: %d, 耗时: %.2fs\", len(combined_ocr_text), time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"ocr_text\": combined_ocr_text, \"ocr_images\": ocr_images}\n",
+    "\n",
+    "print(\"✅ OCR 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_llm_module",
+   "metadata": {},
+   "source": [
+    "## LLM 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含文本清洗（`clean_for_tts`）、LLM 子进程工作函数，以及可独立执行的 `llm_step`。\n",
+    "\n",
+    "**子进程模式**：LLM 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "code_llm_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:12.994715Z",
+     "iopub.status.busy": "2026-05-03T06:14:12.994434Z",
+     "iopub.status.idle": "2026-05-03T06:14:13.009434Z",
+     "shell.execute_reply": "2026-05-03T06:14:13.008415Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:12.994692Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ LLM 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import re\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "\n",
+    "# ---- 文本清洗 ----\n",
+    "\n",
+    "def clean_for_tts(text):\n",
+    "    \"\"\"Clean text for TTS synthesis by removing emojis and markdown formatting.\"\"\"\n",
+    "    # Remove emojis (Unicode ranges for common emojis)\n",
+    "    # NOTE: Must avoid ranges that overlap with CJK characters (U+4E00-U+9FFF)\n",
+    "    text = re.sub(\n",
+    "        r\"[\\U0001F600-\\U0001F64F\"  # emoticons\n",
+    "        r\"\\U0001F300-\\U0001F5FF\"   # symbols & pictographs\n",
+    "        r\"\\U0001F680-\\U0001F6FF\"   # transport & map\n",
+    "        r\"\\U0001F1E0-\\U0001F1FF\"   # flags\n",
+    "        r\"\\U00002702-\\U000027B0\"   # dingbats\n",
+    "        r\"\\U000024C2-\\U0000324F\"   # enclosed alphanumerics (stop before CJK)\n",
+    "        r\"\\U0001F200-\\U0001F251\"   # enclosed CJK supplement (above CJK range)\n",
+    "        r\"\\U0001F900-\\U0001F9FF\"   # supplemental symbols\n",
+    "        r\"\\U0001FA00-\\U0001FA6F\"   # chess symbols\n",
+    "        r\"\\U0001FA70-\\U0001FAFF\"   # symbols extended-A\n",
+    "        r\"\\U00002600-\\U000026FF\"   # misc symbols\n",
+    "        r\"\\U0000FE00-\\U0000FE0F\"   # variation selectors\n",
+    "        r\"\\U0000200D\"              # zero-width joiner\n",
+    "        r\"]+\",\n",
+    "        \"\",\n",
+    "        text,\n",
+    "    )\n",
+    "    # Remove markdown code blocks (```...```)\n",
+    "    text = re.sub(r\"```.*?```\", \"\", text, flags=re.DOTALL)\n",
+    "    # Remove inline code (`...`) -> content\n",
+    "    text = re.sub(r\"`([^`\\n]+)`\", r\"\\1\", text)\n",
+    "    # Remove markdown headers (# ## ### etc.) at line start\n",
+    "    text = re.sub(r\"^#{1,6}\\s+\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown bold (**text**) -> text\n",
+    "    text = re.sub(r\"\\*\\*([^*\\n]+?)\\*\\*\", r\"\\1\", text)\n",
+    "    # Remove markdown bold (__text__) -> text\n",
+    "    text = re.sub(r\"__([^_\\n]+?)__\", r\"\\1\", text)\n",
+    "    # Remove markdown italic (*text*) -> text\n",
+    "    text = re.sub(r\"\\*([^*\\n]+?)\\*\", r\"\\1\", text)\n",
+    "    # Remove markdown italic (_text_) -> text (only when _ is at word boundary)\n",
+    "    text = re.sub(r\"(?<!\\w)_([^_\\n]+?)_(?!\\w)\", r\"\\1\", text)\n",
+    "    # Remove markdown links [text](url) -> text\n",
+    "    text = re.sub(r\"\\[([^\\]]+)\\]\\([^)]+\\)\", r\"\\1\", text)\n",
+    "    # Remove markdown images ![alt](url)\n",
+    "    text = re.sub(r\"!\\[[^\\]]*\\]\\([^)]+\\)\", \"\", text)\n",
+    "    # Remove markdown horizontal rules (---, ***, ___)\n",
+    "    text = re.sub(r\"^[-*_]{3,}\\s*$\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown bullet list markers (- , * , + ) at line start, keep content\n",
+    "    text = re.sub(r\"^(\\s*)[-*+]\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown numbered list markers (1. 2. etc.) at line start, keep content\n",
+    "    text = re.sub(r\"^(\\s*)\\d+\\.\\s+\", r\"\\1\", text, flags=re.MULTILINE)\n",
+    "    # Remove markdown table pipes\n",
+    "    text = re.sub(r\"\\|\", \" \", text)\n",
+    "    # Remove markdown table separator lines (---:---:---)\n",
+    "    text = re.sub(r\"^[-: ]+$\", \"\", text, flags=re.MULTILINE)\n",
+    "    # Collapse multiple blank lines into one\n",
+    "    text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text)\n",
+    "    # Strip leading/trailing whitespace per line\n",
+    "    lines = [line.strip() for line in text.splitlines()]\n",
+    "    text = \"\\n\".join(lines)\n",
+    "    # Remove leading/trailing whitespace overall\n",
+    "    text = text.strip()\n",
+    "    return text\n",
+    "\n",
+    "\n",
+    "# ---- LLM 子进程工作函数 ----\n",
+    "\n",
+    "def llm_worker_process(llm_model_dir, ocr_text, max_new_tokens, result_queue):\n",
+    "    \"\"\"Worker function for LLM subprocess - loads model, extracts info, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        from fastdeploy import LLM, SamplingParams\n",
+    "\n",
+    "        # Load LLM model\n",
+    "        print(\"[LLM Worker] 加载 LLM 模型 (ERNIE)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        llm_model = LLM(\n",
+    "            model=llm_model_dir,\n",
+    "            tensor_parallel_size=1,\n",
+    "            max_model_len=8192,\n",
+    "            block_size=16,\n",
+    "            quantization=\"wint8\",\n",
+    "            graph_optimization_config={\"use_cudagraph\": False},\n",
+    "        )\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[LLM Worker] LLM 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Prepare prompt\n",
+    "        prompt_text = f\"\"\"以下是药品说明书的 OCR 识别结果，供参考：\n",
+    "\n",
+    "{ocr_text}\n",
+    "\n",
+    "请根据以上 OCR 识别结果，提取并整理以下关键信息，用清晰易懂的语言重新表述，方便老年人阅读理解：\n",
+    "\n",
+    "1. 药品名称\n",
+    "2. 药品适应症（这个药治什么病）\n",
+    "3. 药品的用法与用量（怎么吃、吃多少）\n",
+    "4. 药品的禁忌（什么人不能吃、什么情况不能吃）\n",
+    "5. 药品的不良反应（吃药后可能出现的不舒服）\n",
+    "\n",
+    "要求：\n",
+    "- 只输出整理后的关键信息，不要重复或复述 OCR 原文\n",
+    "- 用简洁、通俗的语言回答，避免使用专业术语\n",
+    "- 不要使用表情符号、emoji\n",
+    "- 不要使用markdown格式符号（如#、**、-等），直接用纯文本输出\n",
+    "- 用自然流畅的口语化表达，方便语音播报\n",
+    "- 总字数控制在 {max_new_tokens} 字以内\"\"\"\n",
+    "\n",
+    "\n",
+    "\n",
+    "        # todo\n",
+    "        prompt_text = \"你是谁\"\n",
+    "\n",
+    "        prompts = [prompt_text]\n",
+    "        sampling_params = SamplingParams(\n",
+    "            temperature=0.8, top_p=0.95, max_tokens=max_new_tokens,\n",
+    "        )\n",
+    "\n",
+    "        print(f\"[LLM Worker] 正在生成回复 (max_new_tokens={max_new_tokens})...\")\n",
+    "        gen_start = time.perf_counter()\n",
+    "        outputs = llm_model.generate(prompts, sampling_params)\n",
+    "        result = outputs[0].outputs.text\n",
+    "        gen_elapsed = time.perf_counter() - gen_start\n",
+    "\n",
+    "        # Clean result\n",
+    "        result = clean_for_tts(result)\n",
+    "        print(f\"[LLM Worker] 信息提取完成, 生成耗时: {gen_elapsed:.2f}s, 结果长度: {len(result)}\")\n",
+    "\n",
+    "        # todo\n",
+    "        print(\">>>\", result)\n",
+    "\n",
+    "        # Put result in queue\n",
+    "        result_queue.put((\"success\", result))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del llm_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[LLM Worker] LLM 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 LLM 步骤 (使用子进程) ----\n",
+    "\n",
+    "def llm_step(\n",
+    "    llm_model_dir,\n",
+    "    ocr_text,\n",
+    "    max_new_tokens=1024,\n",
+    "):\n",
+    "    \"\"\"Execute the LLM extraction step in a subprocess.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[LLM Step] LLM 大模型信息提取...\")\n",
+    "\n",
+    "    # Create subprocess for LLM\n",
+    "    logger.info(\"[LLM Step] 启动 LLM 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    llm_process = Process(\n",
+    "        target=llm_worker_process,\n",
+    "        args=(str(llm_model_dir), ocr_text, max_new_tokens, result_queue)\n",
+    "    )\n",
+    "    llm_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    llm_process.join()\n",
+    "    llm_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[LLM Step] LLM 子进程执行失败: %s\", result)\n",
+    "        raise RuntimeError(f\"LLM subprocess failed: {result}\")\n",
+    "\n",
+    "    extracted_info = result\n",
+    "    logger.info(\"[LLM Step] LLM 信息提取完成, 结果长度: %d, 耗时: %.2fs\", len(extracted_info), time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"extracted_info\": extracted_info}\n",
+    "\n",
+    "print(\"✅ LLM 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_tts_module",
+   "metadata": {},
+   "source": [
+    "## TTS 模块\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "包含 TTS 子进程工作函数，以及可独立执行的 `tts_step`。\n",
+    "\n",
+    "**子进程模式**：TTS 模型在独立子进程中加载和执行，完成后子进程自动销毁，确保内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "code_tts_module",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:13.010352Z",
+     "iopub.status.busy": "2026-05-03T06:14:13.010203Z",
+     "iopub.status.idle": "2026-05-03T06:14:13.390794Z",
+     "shell.execute_reply": "2026-05-03T06:14:13.389438Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:13.010334Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ TTS 模块定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from multiprocessing import Process, Queue\n",
+    "from scipy.io.wavfile import read as wav_read\n",
+    "\n",
+    "\n",
+    "# ---- TTS 子进程工作函数 ----\n",
+    "\n",
+    "def tts_worker_process(text, output_path, result_queue):\n",
+    "    \"\"\"Worker function for TTS subprocess - loads model, synthesizes speech, returns result.\"\"\"\n",
+    "    try:\n",
+    "        import time\n",
+    "        from paddlespeech.cli.tts.infer import TTSExecutor\n",
+    "        from scipy.io.wavfile import read as wav_read\n",
+    "\n",
+    "        # Load TTS model\n",
+    "        print(\"[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\")\n",
+    "        start = time.perf_counter()\n",
+    "        tts_model = TTSExecutor()\n",
+    "        elapsed = time.perf_counter() - start\n",
+    "        print(f\"[TTS Worker] TTS 模型加载完成, 耗时: {elapsed:.2f}s\")\n",
+    "\n",
+    "        # Synthesize speech\n",
+    "        print(f\"[TTS Worker] 语音合成开始, 输入文字长度: {len(text)}\")\n",
+    "        tts_model(text=text, output=output_path)\n",
+    "\n",
+    "        # Read audio data\n",
+    "        sr, wav_data = wav_read(output_path)\n",
+    "\n",
+    "        if wav_data is not None:\n",
+    "            audio_duration = len(wav_data) / sr\n",
+    "            print(f\"[TTS Worker] 语音合成完成, 音频时长: {audio_duration:.2f}s, 采样率: {sr} Hz\")\n",
+    "            result_queue.put((\"success\", (sr, wav_data.tolist())))  # Convert to list for serialization\n",
+    "        else:\n",
+    "            print(\"[TTS Worker] 语音合成失败\")\n",
+    "            result_queue.put((\"error\", \"TTS synthesis failed\"))\n",
+    "\n",
+    "        # Clean up\n",
+    "        del tts_model\n",
+    "        import gc\n",
+    "        gc.collect()\n",
+    "        print(\"[TTS Worker] TTS 模型已释放\")\n",
+    "\n",
+    "    except Exception as e:\n",
+    "        import traceback\n",
+    "        result_queue.put((\"error\", str(e) + \"\\n\" + traceback.format_exc()))\n",
+    "\n",
+    "\n",
+    "# ---- 独立 TTS 步骤 (使用子进程) ----\n",
+    "\n",
+    "def tts_step(\n",
+    "    text,\n",
+    "    output_path=\"output.wav\",\n",
+    "):\n",
+    "    \"\"\"Execute the TTS synthesis step in a subprocess.\"\"\"\n",
+    "    step_start = time.perf_counter()\n",
+    "    logger.info(\"[TTS Step] TTS 语音合成...\")\n",
+    "\n",
+    "    # Create subprocess for TTS\n",
+    "    logger.info(\"[TTS Step] 启动 TTS 子进程...\")\n",
+    "    result_queue = Queue()\n",
+    "    tts_process = Process(\n",
+    "        target=tts_worker_process,\n",
+    "        args=(text, output_path, result_queue)\n",
+    "    )\n",
+    "    tts_process.start()\n",
+    "\n",
+    "    # Wait for result\n",
+    "    status, result = result_queue.get()\n",
+    "    tts_process.join()\n",
+    "    tts_process.close()\n",
+    "\n",
+    "    if status == \"error\":\n",
+    "        logger.error(\"[TTS Step] TTS 子进程执行失败: %s\", result)\n",
+    "        logger.warning(\"[TTS Step] TTS 语音合成失败\")\n",
+    "        return {\"audio\": None}\n",
+    "\n",
+    "    sr, wav_data_list = result\n",
+    "    import numpy as np\n",
+    "    wav_data = np.array(wav_data_list, dtype=np.int16)  # Convert back from list\n",
+    "\n",
+    "    audio_duration = len(wav_data) / sr\n",
+    "    logger.info(\"[TTS Step] TTS 语音合成完成, 音频时长: %.2fs, 耗时: %.2fs\", audio_duration, time.perf_counter() - step_start)\n",
+    "\n",
+    "    return {\"audio\": (sr, wav_data)}\n",
+    "\n",
+    "print(\"✅ TTS 模块定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "md_orchestration",
+   "metadata": {},
+   "source": [
+    "## 管线编排\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "`drug_ocr_pipeline` 串联 OCR → LLM → TTS 三个步骤，每个步骤在独立子进程中执行，`make_demo` 构建 Gradio 界面。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "code_orchestration",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:13.674124Z",
+     "iopub.status.busy": "2026-05-03T06:14:13.673982Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.051569Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.050351Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:13.674106Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tempfile\n",
+    "\n",
+    "import numpy as np\n",
+    "import gradio as gr\n",
+    "from scipy.io.wavfile import write as wav_write\n",
+    "\n",
+    "\n",
+    "def drug_ocr_pipeline(\n",
+    "    ocr_model_dir,\n",
+    "    llm_model_dir,\n",
+    "    image_path,\n",
+    "    enable_split=True,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    ocr_max_new_tokens=5120,\n",
+    "    llm_max_new_tokens=1024,\n",
+    "):\n",
+    "    \"\"\"Drug instruction leaflet intelligent recognition and voice broadcast pipeline.\n",
+    "    \n",
+    "    Uses subprocess for each model to ensure proper memory cleanup.\n",
+    "    \"\"\"\n",
+    "    pipeline_start = time.perf_counter()\n",
+    "    logger.info(\"=\" * 60)\n",
+    "    logger.info(\"药品说明书识别管线启动 (子进程模式)\")\n",
+    "    logger.info(\"  图片路径: %s\", image_path)\n",
+    "    logger.info(\"  图片分割: %s (num_splits=%d, overlap=%.2f)\", enable_split, num_splits, overlap_ratio)\n",
+    "    logger.info(\"=\" * 60)\n",
+    "\n",
+    "    result = {}\n",
+    "\n",
+    "    # Step 1: OCR (runs in subprocess, automatically cleaned up)\n",
+    "    ocr_result = ocr_step(\n",
+    "        ocr_model_dir=ocr_model_dir,\n",
+    "        image_path=image_path,\n",
+    "        enable_split=enable_split,\n",
+    "        num_splits=num_splits,\n",
+    "        overlap_ratio=overlap_ratio,\n",
+    "        max_new_tokens=ocr_max_new_tokens,\n",
+    "    )\n",
+    "    result[\"ocr_text\"] = ocr_result[\"ocr_text\"]\n",
+    "\n",
+    "    # Step 2: LLM extraction (runs in subprocess, automatically cleaned up)\n",
+    "    llm_result = llm_step(\n",
+    "        llm_model_dir=llm_model_dir,\n",
+    "        ocr_text=ocr_result[\"ocr_text\"],\n",
+    "        max_new_tokens=llm_max_new_tokens,\n",
+    "    )\n",
+    "    result[\"extracted_info\"] = llm_result[\"extracted_info\"]\n",
+    "\n",
+    "    # Step 3: TTS synthesis (runs in subprocess, automatically cleaned up)\n",
+    "    tts_result = tts_step(\n",
+    "        text=llm_result[\"extracted_info\"],\n",
+    "    )\n",
+    "    result[\"audio\"] = tts_result[\"audio\"]\n",
+    "\n",
+    "    pipeline_elapsed = time.perf_counter() - pipeline_start\n",
+    "    logger.info(\"=\" * 60)\n",
+    "    logger.info(\"管线执行完成, 总耗时: %.2fs\", pipeline_elapsed)\n",
+    "    logger.info(\"=\" * 60)\n",
+    "\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "def make_demo(ocr_model_dir, llm_model_dir, ocr_max_new_tokens=5120, llm_max_new_tokens=1024):\n",
+    "    \"\"\"Create Gradio demo for Drug OCR Pipeline.\"\"\"\n",
+    "\n",
+    "    def gradio_pipeline(\n",
+    "        image_input,\n",
+    "        enable_split,\n",
+    "        num_splits,\n",
+    "        overlap_ratio,\n",
+    "        ocr_max_tokens,\n",
+    "        llm_max_tokens,\n",
+    "        progress=gr.Progress(track_tqdm=True),\n",
+    "    ):\n",
+    "        \"\"\"Gradio interface main processing function\"\"\"\n",
+    "        if image_input is None:\n",
+    "            return \"请上传药品说明书图片\", \"\", None\n",
+    "\n",
+    "        # Convert uploaded image to PIL Image\n",
+    "        if isinstance(image_input, str):\n",
+    "            image = Image.open(image_input).convert(\"RGB\")\n",
+    "        else:\n",
+    "            image = Image.fromarray(image_input).convert(\"RGB\") if not isinstance(image_input, Image.Image) else image_input\n",
+    "\n",
+    "        # Save as temp file for pipeline\n",
+    "        with tempfile.NamedTemporaryFile(suffix=\".jpg\", delete=False) as tmp:\n",
+    "            image.save(tmp.name)\n",
+    "            tmp_path = tmp.name\n",
+    "\n",
+    "        try:\n",
+    "            result = drug_ocr_pipeline(\n",
+    "                ocr_model_dir=ocr_model_dir,\n",
+    "                llm_model_dir=llm_model_dir,\n",
+    "                image_path=tmp_path,\n",
+    "                enable_split=enable_split,\n",
+    "                num_splits=int(num_splits),\n",
+    "                overlap_ratio=overlap_ratio,\n",
+    "                ocr_max_new_tokens=int(ocr_max_tokens),\n",
+    "                llm_max_new_tokens=int(llm_max_tokens),\n",
+    "            )\n",
+    "\n",
+    "            ocr_text = result[\"ocr_text\"]\n",
+    "            extracted_info = result[\"extracted_info\"]\n",
+    "\n",
+    "            # Save audio as temp file\n",
+    "            audio_path = None\n",
+    "            if result[\"audio\"] is not None:\n",
+    "                sr, wav_data = result[\"audio\"]\n",
+    "                audio_tmp = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False)\n",
+    "                wav_write(audio_tmp.name, sr, wav_data.astype(np.float32))\n",
+    "                audio_path = audio_tmp.name\n",
+    "\n",
+    "            return ocr_text, extracted_info, audio_path\n",
+    "        finally:\n",
+    "            import os\n",
+    "            os.unlink(tmp_path)\n",
+    "\n",
+    "    with gr.Blocks(title=\"药品说明书智能识别与语音播报\") as demo:\n",
+    "        gr.Markdown(\"# 药品说明书智能识别与语音播报系统\")\n",
+    "        gr.Markdown(\"上传药品说明书图片，系统将自动识别文字、提取关键信息并语音播报，帮助老年人看清读懂药品说明书。\")\n",
+    "\n",
+    "        with gr.Row():\n",
+    "            with gr.Column(scale=1):\n",
+    "                image_input = gr.Image(label=\"药品说明书图片\", type=\"filepath\")\n",
+    "\n",
+    "                with gr.Accordion(\"图片分割设置\", open=True):\n",
+    "                    enable_split = gr.Checkbox(value=True, label=\"启用图片分割（文字太小时建议开启）\")\n",
+    "                    num_splits = gr.Dropdown(choices=[4, 9, 16], value=4, label=\"分割数量\")\n",
+    "                    overlap_ratio = gr.Slider(minimum=0.0, maximum=0.3, value=0.1, step=0.05, label=\"重叠比例\")\n",
+    "\n",
+    "                with gr.Accordion(\"生成参数设置\", open=True):\n",
+    "                    ocr_max_tokens = gr.Slider(minimum=100, maximum=8192, value=ocr_max_new_tokens, step=1, label=\"OCR 最大生成 token 数\")\n",
+    "                    llm_max_tokens = gr.Slider(minimum=100, maximum=4096, value=llm_max_new_tokens, step=1, label=\"LLM 最大生成 token 数\")\n",
+    "\n",
+    "                run_btn = gr.Button(\"开始识别\", variant=\"primary\")\n",
+    "\n",
+    "            with gr.Column(scale=1):\n",
+    "                ocr_output = gr.Textbox(label=\"OCR 识别结果\", lines=10, max_lines=20)\n",
+    "                info_output = gr.Textbox(label=\"关键信息整理\", lines=15, max_lines=30)\n",
+    "                audio_output = gr.Audio(label=\"语音播报\", type=\"filepath\")\n",
+    "\n",
+    "        run_btn.click(\n",
+    "            fn=gradio_pipeline,\n",
+    "            inputs=[\n",
+    "                image_input,\n",
+    "                enable_split,\n",
+    "                num_splits,\n",
+    "                overlap_ratio,\n",
+    "                ocr_max_tokens,\n",
+    "                llm_max_tokens,\n",
+    "            ],\n",
+    "            outputs=[ocr_output, info_output, audio_output],\n",
+    "        )\n",
+    "\n",
+    "    return demo\n",
+    "\n",
+    "print(\"✅ 管线编排与 Gradio 界面定义完成 (子进程模式)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5b6c7d8",
+   "metadata": {},
+   "source": [
+    "## 主流程\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "主流程包含以下步骤：\n",
+    "1. 加载图片\n",
+    "2. 图片分割（可选，针对文字太小的说明书，将图片切割成多部分进行识别，分割的图片有重叠）\n",
+    "3. OCR 文字识别（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "4. 大模型文字整理（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "5. 语音合成（**在子进程中加载模型，完成后销毁子进程**）\n",
+    "\n",
+    "> 每个步骤在独立的子进程中执行，子进程完成后自动销毁，确保模型内存完全释放。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "e9f0a1b2",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.146943Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.146807Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.151226Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.150312Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:16.146924Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 日志配置完成 (级别: INFO)\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import logging\n",
+    "\n",
+    "logging.basicConfig(\n",
+    "    level=logging.INFO,\n",
+    "    format=\"%(asctime)s [%(name)s] %(levelname)s: %(message)s\",\n",
+    "    datefmt=\"%H:%M:%S\",\n",
+    ")\n",
+    "\n",
+    "print(\"✅ 日志配置完成 (级别: INFO)\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c3d4e5f6",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.152272Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.152120Z",
+     "iopub.status.idle": "2026-05-03T06:14:16.155642Z",
+     "shell.execute_reply": "2026-05-03T06:14:16.154737Z",
+     "shell.execute_reply.started": "2026-05-03T06:14:16.152254Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 模型管理器已移除 - 现在使用子进程模式\n",
+    "# 每个模型在独立的子进程中加载、执行、然后自动销毁\n",
+    "print(\"✅ 子进程模式已启用 - 模型将在需要时自动加载和释放\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a7b8c9d0",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-03T06:14:16.156170Z",
+     "iopub.status.busy": "2026-05-03T06:14:16.156041Z"
+    },
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:14:16 [drug_ocr] INFO: ============================================================\r\n",
+      "14:14:16 [drug_ocr] INFO: 药品说明书识别管线启动 (子进程模式)\r\n",
+      "14:14:16 [drug_ocr] INFO:   图片路径: resource/1.jpg\r\n",
+      "14:14:16 [drug_ocr] INFO:   图片分割: False (num_splits=4, overlap=0.10)\r\n",
+      "14:14:16 [drug_ocr] INFO: ============================================================\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 加载图片...\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 图片加载完成, 尺寸: (2014, 2881)\r\n",
+      "14:14:16 [drug_ocr] INFO: [OCR Step] 跳过图片分割\r\n",
+      "14:14:17 [drug_ocr] INFO: [OCR Step] 启动 OCR 子进程...\r\n",
+      "I0503 14:14:18.096459 1035908 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:14:18.096537 1035908 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:14:18.217633 1035908 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:14:18.217679 1035908 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:14:18.224740 1035908 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:14:18.225076 1035908 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:14:18.225135 1035908 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "WARNING  2026-05-03 14:14:18,795 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n",
+      "\u001b[33m[2026-05-03 14:14:19,226] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n",
+      "WARNING  2026-05-03 14:14:19,740 1035908 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_dad76550-a346-4423-aa82-44018eeaf3ba was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "WARNING  2026-05-03 14:14:19,750 1035908 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] 加载 OCR 模型 (PaddleOCR-VL)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO     2026-05-03 14:14:21,132 1035908 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [28305]\r\n",
+      "INFO     2026-05-03 14:14:21,134 1035908 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [38724]\r\n",
+      "INFO     2026-05-03 14:14:21,136 1035908 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [14751]\r\n",
+      "INFO     2026-05-03 14:14:21,139 1035908 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19484]\r\n",
+      "INFO     2026-05-03 14:14:21,140 1035908 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:21,141 1035908 configuration_utils.py[line:1215] Loading configuration file baidu/PaddleOCR-VL-1.5/config.json\r\n",
+      "WARNING  2026-05-03 14:14:21,143 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "WARNING  2026-05-03 14:14:21,144 1035908 configuration_utils.py[line:1246] You are using a model of type paddleocr_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "INFO     2026-05-03 14:14:22,130 1035908 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "current sm_version=71\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING  2026-05-03 14:14:22,285 1035908 moe.py[line:41] import noaux_tc Failed!\r\n",
+      "INFO     2026-05-03 14:14:24,261 1035908 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:24,264 1035908 configuration_utils.py[line:425] Loading configuration file baidu/PaddleOCR-VL-1.5/generation_config.json\r\n",
+      "INFO     2026-05-03 14:14:24,284 1035908 tokenizer_utils.py[line:257] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:14:25,941 1035908 engine.py[line:151] Waiting for worker processes to be ready...\r\n",
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\r\n",
+      "To disable this warning, you can either:\r\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\r\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\r\n",
+      "Loading Weights: 100%|██████████| 100/100 [00:07<00:00, 13.23it/s] \r\n",
+      "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 198.88it/s]   \r\n",
+      "INFO     2026-05-03 14:14:39,443 1035908 engine.py[line:209] Worker processes are launched with 16.92835831642151 seconds.\r\n",
+      "INFO     2026-05-03 14:14:39,445 1035908 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n",
+      "INFO     2026-05-03 14:14:39,446 1035908 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] OCR 模型加载完成, 耗时: 18.32s\r\n",
+      "[OCR Worker] 识别图片 1/1, 尺寸: (2014, 2881)\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][2026-05-03 14:14:41] [1035908] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 1231, #cached-token: 0, token usage: 0.01, #running-req: 1, #queue-req: 0, \r\n",
+      "[2026-05-03 14:14:44] [1035908] [INFO] Decode batch, dp_rank: 0, #running-req: 1, #token: 1392, token usage: 0.01, cuda graph: False, gen throughput (token/s): 5.34, #queue-req: 0, \r\n",
+      "Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.79s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[OCR Worker] 图片 1 识别完成, 文字长度: 294\r\n",
+      "[OCR Worker] 全部识别完成, 总文字长度: 294\r\n",
+      "[OCR Worker] OCR 模型已释放\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:15:01 [drug_ocr] INFO: [OCR Step] OCR 识别全部完成, 总文字长度: 294, 耗时: 45.70s\r\n",
+      "14:15:01 [drug_ocr] INFO: [LLM Step] LLM 大模型信息提取...\r\n",
+      "14:15:01 [drug_ocr] INFO: [LLM Step] 启动 LLM 子进程...\r\n",
+      "I0503 14:15:02.384891 1076624 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:15:02.384958 1076624 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:02.493517 1076624 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:15:02.493556 1076624 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:15:02.502454 1076624 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:15:02.502794 1076624 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:02.502851 1076624 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "WARNING  2026-05-03 14:15:03,083 1076624 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_24ad65e3-2498-460c-97d2-9a88e46fe8f6 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\r\n",
+      "\u001b[33m[2026-05-03 14:15:03,510] [ WARNING]\u001b[0m - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults `transformers.utils.import_utils.is_torch_available` and `transformers.utils.import_utils.is_torchvision_available` to False. If you need to use PyTorch in transformers or torchvision, please add `del sys.modules['transformers']` before using them.\u001b[0m\r\n",
+      "WARNING  2026-05-03 14:15:03,878 1076624 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_24ad65e3-2498-460c-97d2-9a88e46fe8f6 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.\r\n",
+      "WARNING  2026-05-03 14:15:03,886 1076624 ops.py[line:125] Failed to import cache manager ops: Prefix cache ops only supported CUDA nor XPU platform \r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] 加载 LLM 模型 (ERNIE)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO     2026-05-03 14:15:04,961 1076624 args_utils.py[line:639] Parameter `engine_worker_queue_port` is not specified, found available ports for possible use: [58094]\r\n",
+      "INFO     2026-05-03 14:15:04,964 1076624 args_utils.py[line:639] Parameter `cache_queue_port` is not specified, found available ports for possible use: [56896]\r\n",
+      "INFO     2026-05-03 14:15:04,967 1076624 args_utils.py[line:639] Parameter `rdma_comm_ports` is not specified, found available ports for possible use: [41390]\r\n",
+      "INFO     2026-05-03 14:15:04,970 1076624 args_utils.py[line:639] Parameter `pd_comm_port` is not specified, found available ports for possible use: [19643]\r\n",
+      "INFO     2026-05-03 14:15:04,972 1076624 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:15:04,973 1076624 configuration_utils.py[line:1215] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/config.json\r\n",
+      "WARNING  2026-05-03 14:15:04,975 1076624 configuration_utils.py[line:1246] You are using a model of type ernie4_5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\r\n",
+      "INFO     2026-05-03 14:15:06,143 1076624 flash_attn_backend.py[line:105] Only support CUDA version flash attention.\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "current sm_version=71\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING  2026-05-03 14:15:06,307 1076624 moe.py[line:41] import noaux_tc Failed!\r\n",
+      "INFO     2026-05-03 14:15:07,224 1076624 download.py[line:142] Using download source: huggingface\r\n",
+      "INFO     2026-05-03 14:15:07,227 1076624 configuration_utils.py[line:425] Loading configuration file baidu/ERNIE-4.5-0.3B-Paddle/generation_config.json\r\n",
+      "WARNING  2026-05-03 14:15:07,229 1076624 log.py[line:135] PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. use class QWenTokenizer(PaddleTokenizerMixin, hf.PreTrainedTokenizer) to support multisource download and Paddle tokenizer operations.\r\n",
+      "INFO     2026-05-03 14:15:09,492 1076624 engine.py[line:151] Waiting for worker processes to be ready...\r\n",
+      "Loading Weights: 100%|██████████| 100/100 [00:04<00:00, 24.85it/s] \r\n",
+      "Loading Layers: 100%|██████████| 100/100 [00:00<00:00, 199.46it/s]    \r\n",
+      "INFO     2026-05-03 14:15:20,035 1076624 engine.py[line:209] Worker processes are launched with 13.396349906921387 seconds.\r\n",
+      "INFO     2026-05-03 14:15:20,036 1076624 engine.py[line:220] Detected 10922 gpu blocks and 0 cpu blocks in cache (block size: 16).\r\n",
+      "INFO     2026-05-03 14:15:20,037 1076624 engine.py[line:223] FastDeploy will be serving 8 running requests if each sequence reaches its maximum length: 8192\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] LLM 模型加载完成, 耗时: 15.08s\r\n",
+      "[LLM Worker] 正在生成回复 (max_new_tokens=200)...\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][2026-05-03 14:15:20] [1076624] [INFO] Prefill batch, dp_rank: 0, #new-seq: 1, #new-token: 1, #cached-token: 0, token usage: 0.00, #running-req: 1, #queue-req: 0, \r\n",
+      "[2026-05-03 14:15:22] [1076624] [INFO] Decode batch, dp_rank: 0, #running-req: 1, #token: 176, token usage: 0.00, cuda graph: False, gen throughput (token/s): 8.23, #queue-req: 0, \r\n",
+      "Processed prompts: 100%|██████████| 1/1 [00:03<00:00,  3.45s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[LLM Worker] 信息提取完成, 生成耗时: 3.46s, 结果长度: 289\r\n",
+      ">>> 在这样一支粉色的手指往前一拉，我像一只蝴蝶似的飞到了你的身边\r\n",
+      "\r\n",
+      "你轻轻地将我的手贴在脸颊，柔软的触感瞬间让我一下子陷了进去\r\n",
+      "\r\n",
+      "“喜欢就好，别舍不得，我们一起去海边好不好？”\r\n",
+      "\r\n",
+      "我微微一笑，眼神带着一丝甜蜜，嘴角不自觉地扬起了\r\n",
+      "\r\n",
+      "“好，那就一起去，我保证不弄疼你，我们一起海边，好不好？”\r\n",
+      "\r\n",
+      "我环住你，紧紧地靠在你的身上，感受着你的温度和怀抱的柔软\r\n",
+      "\r\n",
+      "你轻轻地将我搂入怀中，仿佛一只受伤的小动物，任由我紧紧地依靠着你\r\n",
+      "\r\n",
+      "随着一阵海风轻拂，我们来到了海边\r\n",
+      "\r\n",
+      "风轻轻掀起了我的长发，海浪一波一波地涌来\r\n",
+      "\r\n",
+      "我仰头看着那片广阔无垠的蓝，心中满是向往\r\n",
+      "\r\n",
+      "“这就是我想要的，这是我第一次来这里\r\n",
+      "[LLM Worker] LLM 模型已释放\r\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "14:15:30 [drug_ocr] INFO: [LLM Step] LLM 信息提取完成, 结果长度: 289, 耗时: 28.24s\r\n",
+      "14:15:30 [drug_ocr] INFO: [TTS Step] TTS 语音合成...\r\n",
+      "14:15:30 [drug_ocr] INFO: [TTS Step] 启动 TTS 子进程...\r\n",
+      "I0503 14:15:30.627210 1088334 init.cc:238] ENV [CUSTOM_DEVICE_ROOT]=/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device\r\n",
+      "I0503 14:15:30.627287 1088334 init.cc:146] Try loading custom device libs from: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:30.751516 1088334 custom_device_load.cc:51] Succeed in loading custom runtime in lib: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so\r\n",
+      "I0503 14:15:30.751560 1088334 custom_device_load.cc:58] Skipped lib [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device/libpaddle-iluvatar-gpu.so]: no custom engine Plugin symbol in this lib.\r\n",
+      "I0503 14:15:30.759230 1088334 custom_kernel.cc:68] Succeed in loading 887 custom kernel(s) from loaded lib(s), will be used like native ones.\r\n",
+      "I0503 14:15:30.759569 1088334 init.cc:158] Finished in LoadCustomDevice with libs_path: [/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle_custom_device]\r\n",
+      "I0503 14:15:30.759625 1088334 init.cc:244] CustomDevice: iluvatar_gpu, visible devices count: 1\r\n",
+      "\u001b[0;93m2026-05-03 14:15:34.944031381 [W:onnxruntime:Default, cpuid_info.cc:91 LogEarlyWarning] Unknown CPU vendor. cpuinfo_vendor value: 16\u001b[m\r\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[TTS Worker] 加载 TTS 模型 (PaddleSpeech)...\r\n",
+      "[TTS Worker] TTS 模型加载完成, 耗时: 0.00s\r\n",
+      "[TTS Worker] 语音合成开始, 输入文字长度: 289\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pathlib import Path\n",
+    "\n",
+    "sample_image_path = str(Path(\"resource/1.jpg\"))\n",
+    "\n",
+    "result = drug_ocr_pipeline(\n",
+    "    ocr_model_dir=ocr_model_dir,\n",
+    "    llm_model_dir=llm_model_dir,\n",
+    "    image_path=sample_image_path,\n",
+    "    enable_split=False,\n",
+    "    num_splits=4,\n",
+    "    overlap_ratio=0.1,\n",
+    "    ocr_max_new_tokens=ocr_max_new_tokens,\n",
+    "    llm_max_new_tokens=llm_max_new_tokens,\n",
+    ")\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"📋 OCR 识别结果:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(result[\"ocr_text\"][:500] + \"...\" if len(result[\"ocr_text\"]) > 500 else result[\"ocr_text\"])\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"📝 大模型整理结果:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(result[\"extracted_info\"])\n",
+    "\n",
+    "# 播放音频\n",
+    "if result[\"audio\"] is not None:\n",
+    "    import IPython.display as ipd\n",
+    "    sr, wav_data = result[\"audio\"]\n",
+    "    print(\"\\n🔊 播放语音...\")\n",
+    "    ipd.display(ipd.Audio(wav_data, rate=sr))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1f2a3b4",
+   "metadata": {},
+   "source": [
+    "## Gradio 交互界面\n",
+    "[返回目录 ⬆️](#目录：)\n",
+    "\n",
+    "通过 Gradio 界面，用户可以：\n",
+    "- 上传药品说明书图片\n",
+    "- 设置是否启用图片分割及分割数量\n",
+    "- 调整各模型的生成参数（max_new_tokens）\n",
+    "- 查看识别和整理结果\n",
+    "- 播放语音合成的音频\n",
+    "\n",
+    "> 每次点击\"开始识别\"时，各模型在独立子进程中执行，完成后自动销毁子进程释放内存。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c5d6e7f8",
+   "metadata": {
+    "scrolled": true,
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "demo = make_demo(\n",
+    "    ocr_model_dir=ocr_model_dir,\n",
+    "    llm_model_dir=llm_model_dir,\n",
+    "    ocr_max_new_tokens=ocr_max_new_tokens,\n",
+    "    llm_max_new_tokens=llm_max_new_tokens,\n",
+    ")\n",
+    "\n",
+    "try:\n",
+    "    demo.launch(server_name=\"0.0.0.0\", server_port=7860, debug=True)\n",
+    "except Exception:\n",
+    "    demo.launch(debug=True, share=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "py35-paddle1.2.0"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}