【开源实习】基于MindSpore NLP实现ERNIE4.5应用案例开发#44

Open

4everWZ wants to merge 3 commits intomindspore-lab:devfrom

4everWZ:ERNIE4.5

4everWZ commented Jan 23, 2026

基于 MindSpore NLP 的 ERNIE 4.5 模型推理与应用

本 Notebook 演示如何基于 MindSpore 2.7.0 与 MindSpore NLP (MindNLP) 0.5.1 完成 ERNIE 4.5（baidu/ERNIE-4.5-0.3B-Base-PT） 的推理与应用实践，并在 Ascend 环境下进行运行配置与推理验证。

本实验重点覆盖从环境配置、模型加载到文本生成（generate）推理的完整流程，可用于快速上手 MindNLP Transformers 的大模型推理开发。

主要内容

环境与依赖安装
- 说明并安装 mindnlp==0.5.1 等依赖（包括常用文本处理库 jieba、sentencepiece）。
- 给出推荐环境版本信息（Python / MindSpore / MindNLP）。
运行环境配置
- 引入必要库并设置 MindSpore 运行模式。
- 针对大模型推理场景，配置 context.set_context(device_target="Ascend") 作为推理后端。
模型与分词器加载
- 使用 AutoTokenizer.from_pretrained 加载分词器。
- 使用 AutoModelForCausalLM.from_pretrained 加载 ERNIE 4.5 预训练模型权重，完成推理准备。
推理与对话生成封装
- 实现 chat_with_ernie 推理函数，支持输入 query 与 history 的对话式生成逻辑。
- 提供生成参数控制：max_length、temperature、top_p 等，便于调整生成质量与多样性。
示例验证
- 给出推理调用示例，展示模型生成结果，验证整体流程可运行可复现。

文件变更

llm/ernie4_5/inference_ernie4_5.ipynb
llm/README.md

Copilot AI review requested due to automatic review settings

January 23, 2026 08:25

Copilot started reviewing on behalf of 4everWZ

January 23, 2026 08:26

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

This PR adds a Jupyter notebook demonstrating ERNIE 4.5 model inference using MindSpore NLP 0.5.1 on Ascend hardware. The notebook provides a complete walkthrough from environment setup to model loading and text generation.

Changes:

Added inference notebook for ERNIE 4.5 (baidu/ERNIE-4.5-0.3B-Base-PT) model with dependency installation, model loading, and dialogue generation examples
Updated LLM applications README to include ERNIE 4.5 in the actively maintained applications list

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File	Description
llm/ernie4_5/inference_ernie4_5.ipynb	New notebook implementing ERNIE 4.5 model inference with environment setup, model loading using AutoModelForCausalLM, and text generation examples across multiple test cases
llm/README.md	Added ERNIE 4.5 entry to the actively maintained applications table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

llm/ernie4_5/inference_ernie4_5.ipynb

Comment on lines +85 to +89

+                  "## 数据准备\n",
+                  "\n",
+                  "对于大模型推理任务，我们通常不需要像 CV NLP 等任务中那样下载大规模训练数据集。但在实际应用开发中，我们可能需要准备一些特定的 Prompt（提示词）或测试用例。\n",
+                  "\n",
+                  "此处我们创建一个简单的测试数据集，模拟应用场景中的输入。"

Copilot AI Jan 23, 2026

The section titled "数据准备" (Data Preparation) at lines 85-89 is somewhat misleading. Unlike typical ML tasks that require dataset preparation, this section simply creates a list of test prompts. The narrative acknowledges this ("对于大模型推理任务,我们通常不需要像 CV NLP 等任务中那样下载大规模训练数据集"), but the section title "数据准备" might cause confusion. Consider renaming to "测试用例准备" (Test Case Preparation) or "推理样本准备" (Inference Sample Preparation) to more accurately reflect the content.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb Outdated Show resolved Hide resolved

llm/ernie4_5/inference_ernie4_5.ipynb

Comment on lines +178 to +194

+                  "def chat_with_ernie(query, history=[], max_length=2048, temperature=0.7, top_p=0.9):\n",
+                  "    \"\"\"\n",
+                  "    基于 ERNIE 4.5 的对话生成函数\n",
+                  "    \n",
+                  "    Args:\n",
+                  "        query (str): 用户输入的问题\n",
+                  "        history (list): 对话历史\n",
+                  "        max_length (int): 生成的最大长度\n",
+                  "        temperature (float): 采样温度，控制生成的多样性\n",
+                  "        top_p (float): 核采样阈值\n",
+                  "    \n",
+                  "    Returns:\n",
+                  "        str: 模型生成的回答\n",
+                  "    \"\"\"\n",
+                  "    # 1. 构建 Prompt (根据模型具体的 Chat Template 进行调整)\n",
+                  "    # 这里使用一个通用的对话格式示例，具体需参考 ERNIE 4.5 的官方 Prompt 模板\n",
+                  "    inputs = tokenizer(query, return_tensors=\"ms\")\n",

Copilot AI Jan 23, 2026

The history parameter is documented in the docstring and accepted as a function argument, but it is never actually used in the function implementation. The function only processes the current query without incorporating any conversation history. Either the history parameter should be implemented to support multi-turn conversations, or it should be removed from both the function signature and the docstring.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb

Comment on lines +22 to +316

+                  "| 3.10   | 2.7.0     | 0.5.1         |"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "20bb5f2e",
+                 "metadata": {},
+                 "source": [
+                  "### 安装依赖\n",
+                  "\n",
+                  "首先，我们需要安装 MindNLP 及相关依赖库。如果环境中未安装，请执行以下命令："
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "3041a225",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 安装 MindSpore NLP\n",
+                  "# !pip install mindnlp==0.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple\n",
+                  "# 安装常用的文本处理库\n",
+                  "# !pip install jieba\n",
+                  "# !pip install sentencepiece"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "eca64203",
+                 "metadata": {},
+                 "source": [
+                  "### 配置运行环境\n",
+                  "\n",
+                  "引入必要的库，并设置 MindSpore 的运行模式。针对大模型推理，我们使用 Ascend 作为计算后端。"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "6d59a07c",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "import os\n",
+                  "import time\n",
+                  "import mindspore\n",
+                  "from mindspore import context\n",
+                  "import mindnlp\n",
+                  "\n",
+                  "# 设置使用 Ascend 设备\n",
+                  "# 默认使用 PYNATIVE_MODE \n",
+                  "context.set_context(device_target=\"Ascend\")\n",
+                  "\n",
+                  "print(f\"MindSpore version: {mindspore.__version__}\")\n",
+                  "print(\"MindNLP version:\", mindnlp.__version__)"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "188668d7",
+                 "metadata": {},
+                 "source": [
+                  "## 数据准备\n",
+                  "\n",
+                  "对于大模型推理任务，我们通常不需要像 CV NLP 等任务中那样下载大规模训练数据集。但在实际应用开发中，我们可能需要准备一些特定的 Prompt（提示词）或测试用例。\n",
+                  "\n",
+                  "此处我们创建一个简单的测试数据集，模拟应用场景中的输入。"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "1471afc3",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 模拟应用场景数据\n",
+                  "test_cases = [\n",
+                  "    \"请简要介绍一下什么是混合专家模型（MoE）？\",\n",
+                  "    \"写一首关于秋天丰收的七言绝句。\",\n",
+                  "    \"请分析以下句子的情感倾向：'这家餐厅的服务真是太糟糕了，我再也不会来了。'\",\n",
+                  "    \"使用Python写一个冒泡排序算法。\"\n",
+                  "]\n",
+                  "\n",
+                  "print(\"测试用例准备完成。\")"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "71617376",
+                 "metadata": {},
+                 "source": [
+                  "## 模型构建与加载\n",
+                  "\n",
+                  "本章节将演示如何使用 MindSpore NLP 的 `Transformers` 接口加载 ERNIE 4.5 模型。"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "f8f53408",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 加载分词器 (Tokenizer)\n",
+                  "# 分词器负责将自然语言文本转换为模型可理解的 Token ID。\n",
+                  "\n",
+                  "from mindnlp.transformers import AutoTokenizer\n",
+                  "from mindnlp.transformers import AutoModelForCausalLM\n",
+                  "\n",
+                  "MODEL_NAME = \"baidu/ERNIE-4.5-0.3B-Base-PT\"\n",
+                  "\n",
+                  "print(f\"正在加载分词器: {MODEL_NAME} ...\")\n",
+                  "try:\n",
+                  "    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
+                  "    print(\"分词器加载成功。\")\n",
+                  "except Exception as e:\n",
+                  "    print(f\"分词器加载失败，请检查网络或模型名称。错误信息: {e}\")\n",
+                  "    \n",
+                  "# 加载模型 (Model)\n",
+                  "# 在 Ascend 800I/T A2 上，为了节省显存并加速推理，我们推荐使用 float16 精度加载模型。\n",
+                  "\n",
+                  "print(f\"正在加载模型: {MODEL_NAME} ...\")\n",
+                  "\n",
+                  "# 加载模型权重\n",
+                  "# mindspore_dtype=mindspore.float16 可以显著降低显存占用\n",
+                  "try:\n",
+                  "    model = AutoModelForCausalLM.from_pretrained(\n",
+                  "        MODEL_NAME,\n",
+                  "        mindspore_dtype=mindspore.float16\n",
+                  "    )\n",
+                  "    # 将模型设置为评估模式\n",
+                  "    model.set_train(False)\n",
+                  "    print(\"模型加载成功。\")\n",
+                  "except Exception as e:\n",
+                  "    print(f\"模型加载失败。错误信息: {e}\")"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "b4775da6",
+                 "metadata": {},
+                 "source": [
+                  "## 应用开发：构建对话生成函数\n",
+                  "\n",
+                  "为了方便进行多轮对话或特定任务推理，我们将模型的生成过程封装为一个函数。这类似于 ResNet 案例中的“验证”或“推理”步骤。"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "fd42b1ca",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def chat_with_ernie(query, history=[], max_length=2048, temperature=0.7, top_p=0.9):\n",
+                  "    \"\"\"\n",
+                  "    基于 ERNIE 4.5 的对话生成函数\n",
+                  "    \n",
+                  "    Args:\n",
+                  "        query (str): 用户输入的问题\n",
+                  "        history (list): 对话历史\n",
+                  "        max_length (int): 生成的最大长度\n",
+                  "        temperature (float): 采样温度，控制生成的多样性\n",
+                  "        top_p (float): 核采样阈值\n",
+                  "    \n",
+                  "    Returns:\n",
+                  "        str: 模型生成的回答\n",
+                  "    \"\"\"\n",
+                  "    # 1. 构建 Prompt (根据模型具体的 Chat Template 进行调整)\n",
+                  "    # 这里使用一个通用的对话格式示例，具体需参考 ERNIE 4.5 的官方 Prompt 模板\n",
+                  "    inputs = tokenizer(query, return_tensors=\"ms\")\n",
+                  "    \n",
+                  "    # 2. 生成配置\n",
+                  "    # 注意：在 MindSpore 2.7 + MindSpore NLP 0.5.1 中，generate 接口用法与 Huggingface 类似\n",
+                  "    outputs = model.generate(\n",
+                  "        inputs[\"input_ids\"],\n",
+                  "        max_length=max_length,\n",
+                  "        do_sample=True,\n",
+                  "        temperature=temperature,\n",
+                  "        top_p=top_p,\n",
+                  "        pad_token_id=tokenizer.pad_token_id,\n",
+                  "        eos_token_id=tokenizer.eos_token_id\n",
+                  "    )\n",
+                  "    \n",
+                  "    # 3. 解码输出\n",
+                  "    response = tokenizer.decode(outputs[0], skip_special_tokens=True)\n",
+                  "    \n",
+                  "    # 简单的后处理，去除输入部分的重复（视具体 tokenizer 行为而定）\n",
+                  "    if response.startswith(query):\n",
+                  "        response = response[len(query):]\n",
+                  "        \n",
+                  "    return response.strip()\n",
+                  "\n",
+                  "print(\"推理函数封装完成。\")"
+                 ]
+                },
+                {
+                 "cell_type": "markdown",
+                 "id": "87b53bb4",
+                 "metadata": {},
+                 "source": [
+                  "## 实验结果展示\n",
+                  "\n",
+                  "在本节中，我们将使用第3节准备的测试用例，对 ERNIE 4.5 模型进行实际的推理测试，展示其在不同领域的应用能力。"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "b65e8acc",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 知识问答任务\n",
+                  "# 测试模型对专业知识的理解能力。\n",
+                  "\n",
+                  "query_1 = test_cases[0] # 关于 MoE 的问题\n",
+                  "print(f\"Q: {query_1}\")\n",
+                  "\n",
+                  "start_time = time.time()\n",
+                  "response_1 = chat_with_ernie(query_1)\n",
+                  "end_time = time.time()\n",
+                  "\n",
+                  "print(f\"A: {response_1}\")\n",
+                  "print(f\"推理耗时: {end_time - start_time:.2f} s\")"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "f36aedc7",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 文学创作任务\n",
+                  "# 测试模型的创意写作能力。\n",
+                  "\n",
+                  "query_2 = test_cases[1] # 写诗\n",
+                  "print(f\"Q: {query_2}\")\n",
+                  "response_2 = chat_with_ernie(query_2)\n",
+                  "print(f\"A: \\n{response_2}\")"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "9972bcce",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 情感分析任务\n",
+                  "# 测试模型对自然语言的情绪理解能力。\n",
+                  "\n",
+                  "query_3 = test_cases[2] # 写冒泡排序\n",
+                  "print(f\"Q: {query_3}\")\n",
+                  "response_3 = chat_with_ernie(query_3)\n",
+                  "print(f\"A: \\n{response_3}\")"
+                 ]
+                },
+                {
+                 "cell_type": "code",
+                 "execution_count": null,
+                 "id": "33c0675a",
+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "# 逻辑与代码生成任务\n",
+                  "# 测试模型的逻辑推理与代码能力。\n",
+                  "\n",
+                  "query_4 = test_cases[3] # 写冒泡排序\n",
+                  "print(f\"Q: {query_4}\")\n",
+                  "response_4 = chat_with_ernie(query_4)\n",
+                  "print(f\"A: \\n{response_4}\")"
+                 ]
+                }
+               ],
+               "metadata": {
+                "kernelspec": {
+                 "display_name": "mind",
+                 "language": "python",
+                 "name": "python3"
+                },
+                "language_info": {
+                 "codemirror_mode": {
+                  "name": "ipython",
+                  "version": 3
+                 },
+                 "file_extension": ".py",
+                 "mimetype": "text/x-python",
+                 "name": "python",
+                 "nbconvert_exporter": "python",
+                 "pygments_lexer": "ipython3",
+                 "version": "3.11.14"

Copilot AI Jan 23, 2026

There is a discrepancy between the documented Python version and the kernel metadata. Line 22 states the environment uses "Python 3.10", but the kernel metadata at line 316 shows "version": "3.11.14". This inconsistency could confuse users about the actual requirements. Please ensure the documented version matches the tested environment, or clarify that multiple Python versions are supported.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb

Comment on lines +198 to +200

+                  "    outputs = model.generate(\n",
+                  "        inputs[\"input_ids\"],\n",
+                  "        max_length=max_length,\n",

Copilot AI Jan 23, 2026

The max_length parameter is used in the generate() call, but according to modern transformer APIs (including HuggingFace transformers which MindNLP is designed to be compatible with), max_length represents the total length including input tokens. For generation tasks, it's more common and clearer to use max_new_tokens to specify only the number of tokens to generate, excluding the input. This prevents confusion and ensures consistent behavior regardless of input length. Consider using max_new_tokens instead of max_length if supported by MindNLP 0.5.1.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def chat_with_ernie(query, history=[], max_length=2048, temperature=0.7, top_p=0.9):\n",

Copilot AI Jan 23, 2026

The history parameter is defined with a default mutable argument (empty list). This is a common Python pitfall that can lead to unexpected behavior across multiple function calls. The default list will be shared across all calls to the function when no history is provided. Although the function doesn't currently modify the history parameter, it's better to use None as the default and initialize inside the function: history=None and then if history is None: history = [] in the function body.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb Outdated Show resolved Hide resolved

llm/ernie4_5/inference_ernie4_5.ipynb Outdated

Comment on lines +143 to +152

+                  "# 在 Ascend 800I/T A2 上，为了节省显存并加速推理，我们推荐使用 float16 精度加载模型。\n",
+                  "\n",
+                  "print(f\"正在加载模型: {MODEL_NAME} ...\")\n",
+                  "\n",
+                  "# 加载模型权重\n",
+                  "# mindspore_dtype=mindspore.float16 可以显著降低显存占用\n",
+                  "try:\n",
+                  "    model = AutoModelForCausalLM.from_pretrained(\n",
+                  "        MODEL_NAME,\n",
+                  "        mindspore_dtype=mindspore.float16\n",

Copilot AI Jan 23, 2026

The comment mentions using float16 to save memory and speed up inference (lines 143, 148), but according to line 12 in the introduction, the hardware is "Ascend 800I/T A2". Ascend NPUs typically support bfloat16 which provides better numerical stability than float16 for large language models. Consider recommending mindspore.bfloat16 if supported on this hardware, or at least mentioning it as an alternative option in the comments.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def chat_with_ernie(query, history=[], max_length=2048, temperature=0.7, top_p=0.9):\n",

Copilot AI Jan 23, 2026

The function documentation states this is a "对话生成函数" (dialogue generation function) that accepts a history parameter for multi-turn conversations, but the implementation doesn't support this functionality. The function name chat_with_ernie also implies chat/dialogue capability. If multi-turn conversation support is intended for future enhancement, consider renaming the function to generate_with_ernie to better reflect its current single-turn generation capability, or implement the history handling as the name and documentation suggest.

Copilot uses AI. Check for mistakes.

llm/ernie4_5/inference_ernie4_5.ipynb Outdated Show resolved Hide resolved


          Ernie 4.5 inference tasks

b29a61d

4everWZ force-pushed the ERNIE4.5 branch from 27478e9 to b29a61d Compare

January 23, 2026 10:25

moyu026 commented Mar 17, 2026

模型没有加载到npu上


          Update Ascend device configuration for ERNIE 4.5 inference

18afb60

Author

4everWZ commented Mar 17, 2026

模型没有加载到npu上

非常感谢回复，加上了指定context的语句了，麻烦您在检查一下
因为这个跨度比较久了，现在如果需要新的npu申请了还要一定时间。
非常感谢。

moyu026 commented Mar 17, 2026

下面这些地方再改下
https://github.com/4everWZ/applications/blame/18afb6024cfdbfafe2fd1ad5268755ead46b7cd1/llm/ernie4_5/inference_ernie4_5.ipynb#L43
去掉后面的镜像源

https://github.com/4everWZ/applications/blame/18afb6024cfdbfafe2fd1ad5268755ead46b7cd1/llm/ernie4_5/inference_ernie4_5.ipynb#L72-L75
这部分可以删掉

https://github.com/4everWZ/applications/blame/18afb6024cfdbfafe2fd1ad5268755ead46b7cd1/llm/ernie4_5/inference_ernie4_5.ipynb#L134
模型改成"baidu/ERNIE-4.5-0.3B-PT"

https://github.com/4everWZ/applications/blame/18afb6024cfdbfafe2fd1ad5268755ead46b7cd1/llm/ernie4_5/inference_ernie4_5.ipynb#L144-L154
改成bf16的格式，然后加载到npu上：
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
mindspore_dtype=mindspore.bfloat16
).to('npu')

https://github.com/4everWZ/applications/blame/18afb6024cfdbfafe2fd1ad5268755ead46b7cd1/llm/ernie4_5/inference_ernie4_5.ipynb#L196
这里的输入也要加载到npu上：
inputs = tokenizer(query, return_tensors="ms")
inputs = {k: v.to('npu:0') for k, v in inputs.items()}


          Refine ERNIE 4.5 inference notebook: update installation instructions…

f29f393

…, adjust model loading precision, and enhance comments for clarity

moyu026 commented Mar 17, 2026

MindSpore 2.7.0，MindSpore NLP0.5.1版本可以运行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet